| Functional Genomics |
|
|
|
Group leader: Prof. Laszlo Patthy Bioinformatics
Prediction of the structure and function of
proteins
We perform complex bioinformatic analyses of genomic and protein sequence data to predict the structure and function of novel proteins, primarily from Metazoa. These analyses include assignment of the novel proteins/domains to known domain-families, prediction of their most probable function, predictions of the subcellular localization and structure-function aspects of the proteins. As a result of such analyses we have identified some novel multidomain proteins (e.g. WFIKKN1, WFIKKN2) [1],[2] and have defined several novel protein domain-types (e.g. the NTR-domains [3], PAN-domains [4], WIF-domains [5], LCCL-domains [6]).
Genome AnnotationIn the 21st century medical sciences, drug development, agriculture, biotechnology and ecological sciences rely increasingly on information originating from genome projects. Realizing the importance of genome annotation, in 2003 the BioSapiens Network of Excellence was formed with the objective to provide a large scale, concerted effort to annotate genome data by laboratories distributed around Europe. The NoE was made up of bioinformatics researchers from 25 institutions based in 14 countries throughout Europe (http://www.biosapiens.info/page.php). Our team was invited to join the BioSapiens NoE; we participated in Work Package 101 of BioSapiens (Gene definition and alternative splicing) primarily addressing the problems of gene definition.
The first and most crucial step in the
interpretation of genome sequences is the identification of protein-coding
genes and prediction of their structure with bioinformatic tools. The success
of all subsequent steps of biological research exploiting genomic sequences
depends on the quality of these data. The difficulties of gene identification
may be illustrated by the fact that – despite significant improvements in
gene-prediction technologies - prediction of the structure of protein-coding
genes is unreliable: according to current estimates, the structure of less than
50% of predicted human genes is correct [7]. To solve this problem – in the MisPred
project – our team developed a method that helps to decide whether an
experimentally determined or in silico predicted protein-coding sequence
is erroneous (abnormal, incomplete, mispredicted) or not. The MisPred approach
is based on the principle that a protein-coding gene is likely to be
mispredicted if some of its features (or features of the protein it encodes)
conflict with our current knowledge about protein-coding genes and proteins [8],[9],[22].
Identification
of erroneous genes/proteins is of crucial importance since it may protect users
from drawing erroneous conclusions based on erroneous data [23-25]. Nevertheless,
it is also important to correct the errors of sequences. Given the heavy
contamination of sequence databases with incorrect sequences, identification of
erroneous entries and their correction may be performed only by automated
quality control and correction protocols. In our FixPred project we
developed tools for the automatic correction of erroneous sequences and in
several cases verified the corrected predictions experimentally [23]. The next step of genome annotation is the definition/prediction of all aspects of the molecular functions and biological roles of the various genes/proteins in health and disease. This is essential if we wish to assess the relevance of each gene for medicine, agriculture or biotechnology. As part of our TargetPred project we are developing bioinformatic tools for the prediction of the molecular function and biological roles of protein with a view of selecting proteins that are likely to be useful for medicine as drug targets. The underlying principle of the team’s expert system for the selection of drug target candidates is that human drug targets are not random representatives of the human proteome.
Experimental studies on the structure and function of multidomain proteins
We
combine the results of our in silico protein/domain predictions with
experimental techniques to determine the 3D structure and to study the
function of multidomain proteins of major medical importance.
For example, we are studying the structure and function of individual domains of the WFIKKN1 and WFIKKN2 proteins, both of which are implicated in the regulation of muscle growth [10],[11],[12],[13],[14]. Similarly, characterization of the domains of cochlin is of major medical interest since mutations affecting the LCCL and vWA domains of cochlin cause the deafness disorder DFNA9 in humans. We have shown that the majority of mutations that cause hearing loss affect structurally important residues leading to misfolding of cochlin [15],[16],[17],[18],[19]. In collaboration with the NMR group of Gottfried Otting (Research School of Chemistry, The Australian National University, Canberra, Australia) we were the first to solve the structure of an LCCL-domain [15], a WIF-domain [20] and an NTR-domain [21].
References:
|



