At A Glance
Name: Alexei Vazquez
Position: Postdoctoral associate, Department of Physics, University of Notre Dame, Indiana
Prior Experience: PhD in physics, International Schools for Advanced Studies (ISAS/SISSA), Trieste, Italy, 2002
Master’s degree in physics, University of Havana, Cuba, 1997
Undergraduate degree in physics, University of Havana, Cuba, 1995
Published a paper in Nature Biotechnology this month entitled “Global protein function prediction from protein-protein interaction networks.”
What led you to the large-scale prediction of protein functions?
My educational background is in physics. I did the work presented in this article while I was working on my PhD at the International School for Advanced Studies in Trieste, Italy. My advisors, Amos Maritan and Alessandro Vespignani, had been working on applications of statistical mechanics approaches to protein folding and complex networks, respectively. The general topic of my thesis was the study of complex networks, as they arise in different applications, from internet modeling to biology. One of the main problems was the study of protein-protein interaction networks. In a first step, we studied the topology of this class of networks, proposed some quantitative measures to characterize them and a model of an evolving network that, in spite of its simplicity, yields good agreement with the experimental data. In a second step, I focused on the relation between protein interactions and protein function and proposed a large-scale method for making predictions of protein function using information about protein interactions.
After finishing my PhD in the summer of 2002, I moved to Notre Dame University to work with Alberto-László Barabási. His group has made a great contribution to the understanding of how complex networks emerge, what they look like, how they evolve, and what their relevance is in describing general complex systems. In particular, they have investigated the complex cellular network inside the cell — looking both at metabolic and genetic networks. At the moment, I am mainly studying metabolic networks, exploiting the techniques I learned from statistical mechanics.
You recently published a paper in Nature Biotechnology about a new method to predict protein function from experimentally derived protein-protein interactions. Can you briefly describe your approach?
Previous work had already been indicating that there is a significant association between protein interactions and protein functions. When you look at a pair of proteins that interact, in 70 percent of cases they share a common function. This means that you can use experimental data on protein interactions and functional classifications that have already been determined, and use this information to determine the function of the proteins that have not yet been classified.
One approach that had already been used by others is the ‘majority rule’ assignment: you take a protein that has not yet been functionally classified, you look at the classified proteins that interact with it, and then you assign it the most common function found among those classified proteins. The main limitation [of this method] is that you only consider interactions between unclassified and classified proteins, but not between unclassified proteins.
The idea of our methods is to use all the information provided by the protein interaction network, including the interactions between unclassified proteins. If we are predicting the function of an unclassified protein, we look at all proteins — classified or not — that interact with it, and we assign it the most common function found among all its interacting partners.
This can be described mathematically by a classical model of statistical mechanics, Pott’s model, and its implementation is more complex from the computational point of view [than the majority rule]. To use the majority rule, you can consider each unclassified protein individually. You go to its classified neighbors and determine which is the more dominant function among them. With our method, we also use the functional classification assigned to other unclassified proteins. This turns it into a global optimization problem where you have to determine all the classifications for all the unclassified proteins simultaneously. This can be solved using numerical techniques, for instance simulating annealing, a technique that is quite powerful for large scale optimization problems where the desired optimal solution may be hidden among many nearly optimal solutions.
[In our paper], we applied our functional prediction method to the yeast protein-protein interaction network. The interaction dataset was obtained from a paper published by Schwikowski and colleagues (Nature Biotechnology 18, 1257-1261 (2000)). The functional classifications came from the MIPS [Munich Information Center for Protein Sequences] database.
How accurate are your predictions, given that some people say up to 50 percent of yeast two-hybrid interactions from large screens are spurious?
We performed two tests to assess the reliability of the method. The first one related to the success rate: What is the probability that our predictions are correct? We took a certain fraction of the classified proteins, we assumed that they were unclassified, we obtained a functional classification for them using our method, and then compared these predictions with their real classifications. What we found was that if the protein has more than two interacting partners, then the method has a success rate of 60-70 percent. When the protein has one or two interacting partners, the success rate is lower, 30 percent or less. Essentially this is telling us that the predictions are reliable if the protein has more than two interacting partners.
The second test we applied was related to the uncertainty present in protein interaction data from two-hybrid experiments, which contain a significant number of false positives and negatives. The effect of this uncertainty can be modeled by rewiring a certain fraction of protein interactions. We took the original network, removed a [certain number of the] reported interactions and randomly drew new interactions between proteins that do not interact according to the available data. In this way you can generate a different network, use our method for both the original and the rewired network, and compare the functional classification from both. What we saw was that if the percentage of links that had been rewired is small, less than 10 percent, you obtain the same functional classifications in 90 percent of the cases. So even if the two-hybrid experimental data contains false positives and negatives, you can still obtain reliable functional predictions.
Have you applied your approach to other large experimental datasets?
At this moment, we are applying our method to more recent and complete data. In particular, we are using a larger dataset of protein interactions provided by DIP [Database of Interacting Proteins] that combines information from a variety of sources, and more recent protein function classifications obtained from MIPS. This way we will be able to make functional assignments to a larger number of unclassified proteins and compare the results with our previous assignments, providing a further test to the reliability of our predictions.
Has anyone validated your functional predictions experimentally?
No, not as far as we know. We have not established contact with any experimental group yet.
Have you compared your approach for prediction of protein function to others that rely, for example, on sequence similarity?
We are in the process, but we have not finished it yet. We have done it for a few cases but not on a large scale.
How can biologists make use of your method? Is it available as software?
At the moment we don’t provide any software where you can just plug in the protein interactions and functional classifications and get out the functional predictions for unclassified proteins. [However], we can provide assistance to those willing to implement the method, or we can collaborate with them.
What can still be improved?
From the point of view of methodology, we are not working on any improvements at the moment. [Besides applying the method to more recent and complete data], it will also be important to get a better theoretical understanding of the method in order to improve its algorithmic implementation.
Will you collaborate with experimental groups?
Yes, we are planning to do that, but at the moment we are not in contact with any. In June, I will be presenting this work at the Beyond Genome conference in San Diego. We would like to have feedback from other researchers working in the field, and to establish collaborations. This will probably be the most important step in the future, to obtain experimental verification of our protein function predictions.