Preview

Medical Genetics

Advanced search
Open Access Open Access  Restricted Access Subscription Access

Predicting the pathogenicity of missense mutations in the TCF4 gene

https://doi.org/10.25557/2073-7998.2024.12.16-21

Abstract

Background. The vast majority of currently discovered missense variants have unknown clinical significance. In this regard, the classification of such variants is an urgent problem of medical genetics, since the inability to establish the clinical significance of a variant complicates the diagnosis of inherited diseases, as well as the development or application of existing therapeutic strategies. In this work, a new bioinformatics tool AlphaMissense was used to assess the efficiency of variant classification in the TCF4 gene.

Aim: prediction of the pathogenic effect of all possible missense variants in the TCF4 gene using the AlphaMissense tool based on machine learning, and evaluation of the ability to classify variants by this tool using ROC analysis.

Methods. The following were used to create and analyse the data discussed in this paper: Google Colab development environment, Python v3.10 programming language, Biopython library for working with biological sequences, scikit-learn library for ROC analysis. The TCF4 gene sequence contained in the NCBI database was used as a reference. 1241319 single nucleotide polymorphism (SNP) variants were generated, among which 6906 variants are in the coding sequence, of which 3747 were identified as missense variants. Annotation of the obtained data was performed according to ClinVar and AlphaMissense databases using the OpenCRAVAT tool. Of all the detected missense variants, 979 variants were scored by AlphaMissense, of which only 101 variants were reported in the ClinVar database.

Results. When comparing sensitivity (Se), specificity (Sp), ROC curve plots and area under the curve (AUC) values, there is a clear difference in the evaluation of SNP classification as likely pathogenic (AUC = 0.81, Se = 0.68, Sp = 0.78). It can be used as an additional criterion in screening of candidate variants for Pitt-Hopkins syndrome. In contrast, classifying variants as likely benign or ambiguous lacks sensitivity and specificity, and their AUC scores characterise them as models of medium quality. Therefore, the variants included in these groups require further reassessment by other tools.

Conclusions. The measured values make it evident that the AlphaMissense tool is best at identifying likely pathogenic variants. However, variants identified as likely benign or ambiguous should be considered questionable and should be tested with other tools. Variants obtained by artificial mutagenesis and assessed as likely pathogenic but not listed in databases may be useful in identifying previously unknown variants in the TCF4 gene and help in the diagnosis and development of therapies for associated diseases.

About the Authors

S. N. Gosudarkina
Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics
Russian Federation

Sophia N. Gosudarkina

10, Naberejnaya Ushaiki, Tomsk, 634050



R. R. Savchenko
Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics
Russian Federation

10, Naberejnaya Ushaiki, Tomsk, 634050



N. A. Skryabin
Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics
Russian Federation

10, Naberejnaya Ushaiki, Tomsk, 634050



References

1. Cheng J., Novati G., Pan J., et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381(6664):eadg7492.

2. Teixeira J.R., Szeto R.A., Carvalho V.M.A. et al. Transcription factor 4 and its association with psychiatric disorders. Translational psychiatry. 2021.;11(1):19.

3. Stefansson H., Ophoff R.A., Steinberg S. et al. Common variants conferring risk of schizophrenia. Nature. 2009;460(7256):744–747.

4. Smoller J.W., Kendler K.K., Craddock N. et al.Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 2013;381(9875):1371–1379.

5. Wray N.R., Ripke S., Mattheisen M. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nature genetics. 2018;50(5):668–681.

6. Cock P.J., Antao T., Chang J.T. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(1422):3.

7. Sayers E.W., Bolton E.E., Brister J.R. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50(D1):D20-D26.

8. Pagel K.A., Kim R., Moad K. et al. Integrated Informatics Analysis of Cancer-Related Variants. JCO Clin Cancer Inform. 2020;4:310-317.

9. Landrum M.J., Lee J.M., Riley G.R. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42(D980):5.

10. Tunyasuvunakool K., Adler J., Wu Z. et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596(7873):590-596.

11. Ljungdahl A., Kohani S., Page N.F. et al. AlphaMissense is better correlated with functional assays of missense impact than earlier prediction algorithms. bioRxiv [Preprint].2023.

12. Pedregosa F., Varoquaux G., Gramfor A. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.

13. Sonego P., Kocsor A., Pongor S. ROC analysis: applications to the classification of biological sequences and 3D structures. Briefings in Bioinformatics. 2008;9(3):198–209.

14. Teixeira J.R., Szeto R.A., Carvalho V.M.A., Muotri A.R., Papes F. Transcription factor 4 and its association with psychiatric disorders. Transl Psychiatry. 2021;11(1):19.


Review

For citations:


Gosudarkina S.N., Savchenko R.R., Skryabin N.A. Predicting the pathogenicity of missense mutations in the TCF4 gene. Medical Genetics. 2024;23(12):16-21. (In Russ.) https://doi.org/10.25557/2073-7998.2024.12.16-21

Views: 138


ISSN 2073-7998 (Print)