Center of Excellence of the European Union
 
Set page width to 800
Login
 
Functional Genomics Print E-mail

 

Group leader: Prof. Laszlo Patthy

Bioinformatics

 

Prediction of the structure and function of proteins

We perform complex bioinformatic analyses of genomic and protein sequence data to predict the structure and function of novel proteins, primarily from Metazoa. These analyses include assignment of the novel proteins/domains to known domain-families, prediction of their most probable function, predictions of the subcellular localization and structure-function aspects of the proteins.

     As a result of such analyses we have identified some novel multidomain proteins (e.g. WFIKKN1, WFIKKN2) [1],[2] and have defined several novel protein domain-types (e.g. the NTR-domains [3], PAN-domains [4], WIF-domains  [5], LCCL-domains [6]).

          

Genome Annotation

            In the 21st century medical sciences, drug development, agriculture, biotechnology and ecological sciences rely increasingly on information originating from genome projects.

            Realizing the importance of genome annotation, in 2003 the BioSapiens Network of Excellence was formed with the objective to provide a large scale, concerted effort to annotate genome data by laboratories distributed around Europe. The NoE was made up of bioinformatics researchers from 25 institutions based in 14 countries throughout Europe (http://www.biosapiens.info/page.php).

            Our team was invited to join the BioSapiens NoE; we participated in Work Package 101 of BioSapiens (Gene definition and alternative splicing) primarily addressing the problems of gene definition.

The first and most crucial step in the interpretation of genome sequences is the identification of protein-coding genes and prediction of their structure with bioinformatic tools. The success of all subsequent steps of biological research exploiting genomic sequences depends on the quality of these data. The difficulties of gene identification may be illustrated by the fact that – despite significant improvements in gene-prediction technologies - prediction of the structure of protein-coding genes is unreliable: according to current estimates, the structure of less than 50% of predicted human genes is correct [7]. To solve this problem – in the MisPred project – our team developed a method that helps to decide whether an experimentally determined or in silico predicted protein-coding sequence is erroneous (abnormal, incomplete, mispredicted) or not. The MisPred approach is based on the principle that a protein-coding gene is likely to be mispredicted if some of its features (or features of the protein it encodes) conflict with our current knowledge about protein-coding genes and proteins [8],[9],[22]

            Identification of erroneous genes/proteins is of crucial importance since it may protect users from drawing erroneous conclusions based on erroneous data [23-25]. Nevertheless, it is also important to correct the errors of sequences. Given the heavy contamination of sequence databases with incorrect sequences, identification of erroneous entries and their correction may be performed only by automated quality control and correction protocols. In our FixPred project we developed tools for the automatic correction of erroneous sequences and in several cases verified the corrected predictions experimentally [23].

            The next step of genome annotation is the definition/prediction of all aspects of the molecular functions and biological roles of the various genes/proteins in health and disease. This is essential if we wish to assess the relevance of each gene for medicine, agriculture or biotechnology. 

            As part of our TargetPred project we are developing bioinformatic tools for the prediction of the molecular function and biological roles of protein with a view of selecting proteins that are likely to be useful for medicine as drug targets. The underlying principle of the team’s expert system for the selection of drug target candidates is that human drug targets are not random representatives of the human proteome.

 

Experimental studies on the structure and function of multidomain proteins            

             

   We combine the results of our in silico protein/domain predictions with experimental techniques to determine the 3D structure and to study the function of multidomain proteins of major medical importance.

            For example, we are studying the structure and function of individual domains of the WFIKKN1 and WFIKKN2 proteins, both of which are implicated in the regulation of muscle growth [10],[11],[12],[13],[14].

Similarly, characterization of the domains of cochlin is of major medical interest since mutations affecting the LCCL and vWA domains of cochlin cause the deafness disorder DFNA9 in humans. We have shown that the majority of mutations that cause hearing loss affect structurally important residues leading to misfolding of cochlin [15],[16],[17],[18],[19].

    In collaboration with the NMR group of Gottfried Otting (Research School of Chemistry, The Australian National University, Canberra, Australia) we were the first to solve the structure of an LCCL-domain [15],  a WIF-domain [20] and an NTR-domain [21].

 

 

  References:

  1. Trexler M, Bányai L and Patthy L (2001)
    A human protein containing multiple types of protease-inhibitory modules.
    P Natl Acad Sci Usa 98, 3705-9 [PubMed]
  2. Trexler M, Bányai L and Patthy L (2002)
    Distinct expression pattern of two related human proteins containing multiple types of protease-inhibitory modules.
    Biol Chem 383, 223-8 [PubMed]
  3. Bányai L and Patthy L (1999)
    The NTR module: domains of netrins, secreted frizzled related proteins, and type I procollagen C-proteinase enhancer protein are homologous with tissue inhibitors of metalloproteases.
    Protein Sci 8, 1636-42 [PubMed]
  4. Tordai H, Bányai L and Patthy L (1999)
    The PAN module: the N-terminal domains of plasminogen and hepatocyte growth factor are homologous with the apple domains of the prekallikrein family and with a novel domain found in numerous nematode proteins.
    FEBS Lett 461, 63-7 [PubMed]
  5. Patthy L (2000)
    The WIF module.
    Trends Biochem Sci 25, 12-3 [PubMed]
  6. Trexler M, Bányai L and Patthy L (2000)
    The LCCL module.
    Eur J Biochem 267, 5751-7 [PubMed]
  7. Harrow J, Nagy A, Reymond A, Alioto T, Patthy L, Antonarakis SE and Guigó R (2009)
    Identifying protein-coding genes in genomic sequences.
    Genome Biol 10, 201 [PubMed]
  8. Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PL, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, López G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Størling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramírez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis SE, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones DT, Lengauer T, Orengo CA, Patthy L, Thornton JM, Tramontano A and Valencia A (2007)
    The implications of alternative splicing in the ENCODE protein complement.
    P Natl Acad Sci Usa 104, 5495-500 [PubMed]
  9. Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L and Patthy L (2008)
    Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.
    BMC Bioinformatics 9, 353 [PubMed]
  10. Nagy A, Trexler M and Patthy L (2003)
    Expression, purification and characterization of the second Kunitz-type protease inhibitor domain of the human WFIKKN protein.
    Eur J Biochem 270, 2101-7 [PubMed]
  11. Liepinsh E, Nagy A, Trexler M, Patthy L and Otting G (2006)
    Second Kunitz-type protease inhibitor domain of the human WFIKKN1 protein.
    J Biomol NMR 35, 73-8 [PubMed]
  12. Kondás K, Szláma G, Trexler M and Patthy L (2008)
    Both WFIKKN1 and WFIKKN2 Have High Affinity for Growth and Differentiation Factors 8 and 11.
    J Biol Chem 283, 23677-84 [PubMed]
  13. Szláma G, Kondás K, Trexler M and Patthy L (2010)
    WFIKKN1 and WFIKKN2 bind growth factors TGFβ1, BMP2 and BMP4 but do not inhibit their signalling activity.
    FEBS J 277, 5040-50 [PubMed]
  14. Kondás K, Szláma G, Nagy A, Trexler M and Patthy L (2011)
    Biological functions of the WAP domain-containing multidomain proteins WFIKKN1 and WFIKKN2.
    Biochem Soc T 39, 1416-20 [PubMed]
  15. Liepinsh E, Trexler M, Kaikkonen A, Weigelt J, Bányai L, Patthy L and Otting G (2001)
    NMR structure of the LCCL domain and implications for DFNA9 deafness disorder.
    EMBO J 20, 5347-53 [PubMed]
  16. Usami S, Takahashi K, Yuge I, Ohtsuka A, Namba A, Abe S, Fransen E, Patthy L, Otting G and Van Camp G (2003)
    Mutations in the COCH gene are a frequent cause of autosomal dominant progressive cochleo-vestibular dysfunction, but not of Meniere's disease.
    Eur J Hum Genet 11, 744-8 [PubMed]
  17. Nagy I, Horváth M, Trexler M, Répássy G and Patthy L (2004)
    A novel COCH mutation, V104del, impairs folding of the LCCL domain of cochlin and causes progressive hearing loss.
    Journal of Medical Genetics 41, e9 [PubMed]
  18. Nagy I, Trexler M and Patthy L (2008)
    The second von Willebrand type A domain of cochlin has high affinity for type I, type II and type IV collagens.
    FEBS Lett 582, 4003-7 [PubMed]
  19. Cho HJ, Park HJ, Trexler M, Venselaar H, Lee KY, Robertson NG, Baek JI, Kang BS, Morton CC, Vriend G, Patthy L and Kim UK (2012)
    A novel COCH mutation associated with autosomal dominant nonsyndromic hearing loss disrupts the structural stability of the vWFA2 domain.
    J Mol Med 90, 1321-31 [PubMed]
  20. Liepinsh E, Bányai L, Patthy L and Otting G (2006)
    NMR structure of the WIF domain of the human Wnt-inhibitory factor-1.
    J Mol Biol 357, 942-50 [PubMed]
  21. Liepinsh E, Banyai L, Pintacuda G, Trexler M, Patthy L and Otting G (2003)
    NMR structure of the netrin-like domain (NTR) of human type I procollagen C-proteinase enhancer defines structural consensus of NTR domains and assesses potential proteinase inhibitory activity and ligand binding.
    J Biol Chem 278, 25982-9 [PubMed] 

    22.  Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L and Patthy L (2008)
    Quality control of gene predictions.
    In: Modern Genome Annotation. The BioSapiens Network.
    Edited by Dmitrij Frishman and Alfonso Valencia. SpringerWienNewYork.
    23.  Nagy A, Szláma G, Szarka E, Trexler M, Bányai L and Patthy L (2011)
    Reassessing Domain Architecture Evolution of Metazoan Proteins:
    Major Impact of Gene Prediction Errors.
    Genes 2(3), 449-501
    24. Nagy A, Bányai L and Patthy L (2011)
    Reassessing Domain Architecture Evolution of Metazoan Proteins:
    Major Impact of Errors Caused by Confusing Paralogs and Epaktologs.
    Genes 2(3), 516-561
    25.  Nagy A and Patthy L (2011)
    Reassessing Domain Architecture Evolution of Metazoan Proteins:
    The Contribution of Different Evolutionary Mechanisms.
    Genes 2(3), 578-598