 In the second half of the 90’s the focus of our research has shifted towards the study of transmembrane proteins. We have shown that the sequence alignment of transmembrane proteins requires special methods and that the topology of transmembrane proteins corresponds to a state of maximum likelihood, which can be identified by a hidden Markov algorithm. These findings are the basis of our widely used DAS and HMMTOP topology prediction algorithms, the DAS-TMfilter method used for the identification of transmembrane proteins and partially the PSORT-B method, used for the prediction of the subcellular localization of proteins. Furthermore, we have developed an algorithm for determining the exact location and orientation of the membrane in X-ray protein structures (see TMDET web server). The weekly updated PDB_TM database is assembled using this algorithm. Two additional databases, TOPDB and TOPDOM containing chemical and biochemical data were constructed as well. All three databases are equipped with search engines and are publicly available through the internet. Apart from general theoretical and methodological works, the group has conducted significant studies concerning ABC transporter proteins and unveiled the transmembrane origin of prion proteins.
To achieve the proposed aim we have developed various topology prediction methods (DAS, DASTM-filter and HMMTOP) as well as collected publicly available topology information in databases (PDBTM and TOPDB).
HMMTOP
 Figure 1 HMMTOP method [1] is based on the hypothesis that the localizations of the transmembrane segments and the topology are determined by the difference in the amino acid distributions in various structural parts of these proteins rather than by specific amino acid compositions of these parts. A hidden Markov model with special architecture was developed to search transmembrane topology corresponding to the maximum likelihood among all the possible topologies of a given protein. By other words HMMTOP method utilizes a standard Hidden Markov Model in an unsupervised learning fashion, where the model parameters are optimized for each protein to predict its topology. The model consists five states corresponding to structural parts of membrane proteins: membrane helix (h),
inside and outside helix tail (i and o), inside and outside loop (I and O).
Two connected tails form a short loop associated with the membrane, while
the tail-loop-tail
sequence forms a long loop in the cytosol or in the extra-cytosol (Figure 1).
 Figure 2 The sum of divergence values between the distribution of amino acids of the structural parts and the distribution of residues in the whole protein measures differences in the amino acid distributions of the structural parts. This sum differs only in a constant from the log-likelihood, therefore
the topology of membrane proteins can be determined if their amino acid sequences can be segmented to some part (e.g. inside, outside and membrane) in such a way that the product of the relative frequencies of the amino acids of these segments along the amino acid sequence should be maximal. Using more types of structural parts or enabling some controls on the length of the various segments may enhance the power of the method. To solve this task (i.e. to maximalise the product of relative frequencies over an amino acid sequence) a standard hidden Markov model (HMM) were utilized. The architecture of the model can be seen on Figure 2.
 Figure 3 We provide a web server to predict topology of transmembrane proteins using HMMTOP method [2]. This version of HMMTOP enables users to add aditional information about segment localization to enchance prediction accuracy. This option not just enchance the prediction accuracy, but helps the interpretation of experimental results (see below). HMMTOP was the first method, which was able to incorporate experimental results into the prediction.
DAS and DASTM-filter
 Figure 4 "Dense Alignment Surface" (DAS) algorithm [3] is based on
low-stringency dot-plots of the query sequence against a collection of
library sequences - non-homologous membrane proteins - using a previously
derived, special scoring matrix. The method provides a high precision
hyrdophobicity profile for the query from which the location of the
potential transmembrane segments can be obtained. The principle difference between the DAS method and the hyrdophobicity
profile based ones is that DAS describes the hydrophobic segments at tree
levels. In the first instance an ideal TM fragment is similar to any other
as they are all made up of hydrophobic residues. Secondly, if two
fragments are similar then the similarity remains high even when the two
fragments are shifted relative to each other. Finaly, when there are
several TM fragments in both sequences we expect to see the grid-like
arrengaments of similarity regions at the intersections of the TM
fragments. This complex approach of hydrophobicity is the key behind the
sensitivity of the DAS method.
DAS-TMfilter, the modified and updated version
of DAS algorithm [4,5], uses a second - "reversed" - prediction cycle. In this step the query
sequence is used to predict TM segments in the sequences of the
TM-library. The result of the prediction is compared with the location of
the known TM segments and the quality of the prediction computed according
to the success rate. "Strong" profiles with high quality scores are
obtained when the query is a real TM-protein and non-TM queries result in
"weak" profiles and low quality scores. Based on this test the type of the
query can be jugded. The error rate of wrong assignment is significantly
lower than the direct application of any TM-prediction method alone. In
this way the automated screening for TM-proteins in the unannotated
genomic data seems to be possible at a reasonably low error rate.
PDBTM and TMDET
The presence of transmembrane proteins in the structure data bank (PDB [6]) is quite invisible, as the annotation of these entries is rather poor. Even if a protein is identified as a transmembrane one, the possible location of the lipid bilayer is not indicated in the PDB because these proteins are crystallized without their natural lipid bilayer, and there was no method publicly available to detect the possible membrane plane using the atomic coordinates of membrane proteins. We have developed a new geometrical approach to distinguish between transmembrane and globular proteins using structuralinformation only and to locate the most likely position of the lipid bilayer. An automated algorithm (TMDET , [7]) is given to determine the membrane planes relative to the position of atomic coordinates,together with a discrimination function which is able to separate transmembrane and globular proteins even in cases of low resolution or incomplete structures such as fragments or parts of large multi chain complexes. This method can be used for the proper annotation of protein structures containing transmembrane segments and paves the way to an up-to-date database containing the structure of all known transmembrane proteins and fragments (PDBTM , [8,9]) which can be automatically updated. The algorithm is equally important for the purpose of constructing databases purely of globular proteins.
TOPDB
 Figure 5 The database collects the details of various experiments carried out
to learn about the topology of particular transmembrane proteins. The
experimental techniques include fusion with reporter enzymes,
glycolysation studies, protease accessibility, immunolocalisation, etc.
In addition to literature-derived data, an extensive collection of
structural data was also compiled from PDB and from PDBTM by utilising the TMDET algorithm.
While
literature-derived data can not be collected automatically, data based
on 3D structures provides semi-automatic and continuously updated
information for the database. Structural data is the most reliable
information about transmembrane topologies, but the topology
information is often incomplete. Therefore, for each protein in the
database the most probable topology consistent with the collected
experimental constraints was also calculated using HMMTOP transmembrane topology prediction algorithm.
Each
record in TOPDB also contains the indispensable information about the
given protein such as its sequence, name, organism and cross references
to various databases (PDB, PDBTM, UniProt and literature references from PubMed).
This web interface of TOPDB includes tools for extensive searching, relational querying and data browsing as well as visualisation tools for topology data.
References: - Tusnády GE and Simon I (1998)
Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol 283, 489-506 [PubMed] - Tusnády GE and Simon I (2001)
The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849-50 [PubMed] - Cserző M, Wallin E, Simon I, von Heijne G and Elofsson A (1997)
Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng 10, 673-6 [PubMed] - Cserző M, Eisenhaber F, Eisenhaber B and Simon I (2002)
On filtering false positive transmembrane protein predictions. Protein Eng 15, 745-52 [PubMed] - Cserző M, Eisenhaber F, Eisenhaber B and Simon I (2004)
TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter. Bioinformatics 20, 136-7 [PubMed] - Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN and Bourne PE (2000)
The Protein Data Bank. Nucleic Acids Res 28, 235-42 [PubMed] - Tusnády GE, Dosztányi Z and Simon I (2005)
TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates. Bioinformatics 21, 1276-7 [PubMed] - Tusnády GE, Dosztányi Z and Simon I (2004)
Transmembrane proteins in the Protein Data Bank: identification and classification. Bioinformatics 20, 2964-72 [PubMed] - Tusnády GE, Dosztányi Z and Simon I (2005)
PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res 33, D275-8 [PubMed]
|