Virginia Tech scientists have developed an algorithm that uses protein interaction data to make predictions about the function of individual molecules.
In a study published this week in PLoS Computational Biology, the researchers used the algorithm to identify potential HIV dependency factors, a class of human proteins that are essential to HIV replication and could serve as both prognostic biomarkers and therapeutic targets for the disease.
Called SinkSource, the algorithm emerged from work the researchers have been conducting on gene-function prediction, lead study author T.M. Murali, associate professor of computer science at Virginia Tech, told ProteoMonitor. He said his team decided to apply it to the question of HDFs after coming across three previous RNAi studies looking into potential HDFs and noticing a surprisingly small overlap in the proteins identified by the different groups.
"We realized that our algorithm would be very nicely applicable to trying to understand why it is that these HDF studies have such little overlap and to, at the same time, potentially predict more HDFs," said Murali.
His team combined the proteins identified in the three studies with known non-HDF proteins in the context of a human protein-protein interaction network constructed from a series of protein interaction databases. Using the HDFs identified in the studies as positive examples and the non-HDF proteins in the network as negative examples, they used SinkSource to predict undiscovered HDFs.
"The property we were interested in was, 'Is or is not [a protein] an HDF?' and we knew about some HDFs that these experimental screens told us about," Murali said. "So by placing the genes and protein products in the context of a protein interaction network, we can spread the HDF-ness of the genes across the network."
"That signal gets transmitted [in such a way] that proteins close to the HDF proteins have a greater probability of being HDFs and those farther away have a lesser probability," he said.
While the PLoS Computational Biology paper focused on HIV-related proteins, the SinkSource algorithm is "in principle agnostic to the specific biological question that we want to address," Murali said. "If we had a specific cancer pathway that we were interested in and we knew that that cancer pathway hadn't been fully fleshed out, we could apply the algorithm to ask which other genes and proteins should we think of as part of this pathway, thereby improving our state of knowledge about it."
Anna Malovannaya, a postdoctoral fellow in the lab of Baylor College of Medicine researcher Bert O'Malley, added that "it's very interesting work because it essentially brings up the question of how we predict the function within protein interaction networks.",
Malovannaya, who was not part of the Virginia Tech team's work, developed many of the bioinformatic tools used in a recently published large-scale protein interaction study done by O'Malley's team (PM 6/3/2011).
"What it pinpoints is that when you have particular candidates but you don't see a complete pathway from the perspective of genomics or other types of screens, the protein interaction network is something that can pull things together and actually make functional sense of it," she told ProteoMonitor. This suggests the potential usefulness of reinterrogating previously done protein studies in the context of interaction networks, she added.
"I think it highlights the usefulness of applying large-scale, high-density interactome networks to other studies that have already been done where people might have focused just on part of the picture," Malovannaya said.
The Virginia Tech study offers "a very good case where you take what appears to be very sparse data without much overlap from three different studies, and once you put it together in the context of a protein network you can actually find a connection," Malovannaya said. "And I think that is a very valid thing to do. Protein interactions are actually one of the more restrictive parameters you can find in a cell, so I think this is a very good way to go."
Murali's team currently has a paper in review wherein they applied the algorithm to a previously characterized functionally linked network in Arabidopsis, he said. "This was a network published by another group, so we simply used the network along with the existing annotations for Arabidopsis genes to ask [for] which we can actually make good predictions … based on our algorithms."
The study aimed to test "not only how accurate our algorithm is, but also whether the state of knowledge of the functional interactions in Arabidopsis are sufficient to allow reliable predictions of gene functions," Murali said. "There will be some processes for which we can make high-quality predictions and some for which we cannot. So this will give us a clue as to which directions we should explore."
The incorporation of this sort of experimental data is key to further developing the algorithm, making collaborations with experimental researchers a crucial part of the work, Murali noted.
The HDF work was done with University of Washington microbiologist Michael Katze. The group is also involved in an ongoing collaboration with Virginia Tech researcher Padma Rajagopalan that is studying signaling pathways in liver tissue models.
To test the predicted HDFs, the group observed how two species of primates – African green monkeys and pigtailed macaques – responded to being infected with the simian immunodeficiency virus, the non-human primate version of HIV.
AGMs are natural reservoirs of SIV that don't develop AIDS, while PTMS do develop the disease when infected with SIV. Using gene-expression data, the group demonstrated that many of the predicted HDFs showed differential expression in the two primates when infected. This bolstered the case that they were, in fact, true HDFs.
Additionally, the researchers are making their predictions publicly available with the aim of encouraging other researchers to test and validate them, Murali added.
"We don't want to keep these predictions to ourselves," he said.."We're hoping that the entire scientific community is able to look at them and validate them as each group's expertise or funding allows."
Have topics you'd like to see covered in ProteoMonitor? Contact the editor at abonislawski [at] genomeweb [.] com.