Background: It has repeatedly been shown that interacting protein families tend to have similar phylogenetic trees. These similarities can be used to predicting the mapping between two families of interacting proteins (i.e. which proteins from one family interact with which members of the other). The correct mapping will be that which maximizes the similarity between the trees. The two families may eventually comprise orthologs and paralogs, if members of the two families are present in more than one organism. This fact can be exploited to restrict the possible mappings, simply by impeding links between proteins of different organisms. We present here an algorithm to predict the mapping between families of interacting proteins which is able to incorporate information regarding orthologues, or any other assignment of proteins to "classes" that may restrict possible mappings. Results: For the first time in methods for predicting mappings, we have tested this new approach on a large number of interacting protein domains in order to statistically assess its performance. The method accurately predicts around 80% in the most favourable cases. We also analysed in detail the results of the method for a well defined case of interacting families, the sensor and kinase components of the Ntr-type two-component system, for which up to 98% of the pairings predicted by the method were correct. Conclusion: Based on the well established relationship between tree similarity and interactions we developed a method for predicting the mapping between two interacting families using genomic information alone. The program is available through a web interface.
Bibliographical noteFunding Information:
We are especially grateful to Antonio Rausell for interesting discussions and advice, especially about the statistical analysis. This work was in part funded by the projects BIO2006-15318 and PIE 200620I240 from the Spanish Ministry for Education and Science, and the European Union Projects LSHG-CT-2004-503567 (GENEFUN), LSHG-CT-2003-503265 (BIOSAPI-ENS), LSHG-CT-2004-512092 (EMBRACE) and LSHG-CT-2004-503568 (COMBIO). Computer support was provided by the Barcelona Supercomputer Centre (BSC) through the project BCV-2006-4-0010.