A study by researchers at Johns Hopkins University and the Institute of Bioinformatics in Bangalore, India, found that a protein's biological importance does not correlate well with the number of other proteins with which it interacts, and that proteins associated with diseases are more likely to interact with other disease-associated proteins.
The first conclusion debunks a popular theory that is based on early studies of the yeast system, while the second conclusion has important implications for drug discovery, said Akhilesh Pandey, an assistant professor at Johns Hopkins and the senior author of the study, published in this month's Nature Genetics.
"If I told you that a certain protein has a lot of interactions, a person would have liked to believe that it is essential. But no, that is not a great predictor," said Pandey. "You can't just conclude that high connectively equals essentiality. It is not true."
In terms of drug discovery, Pandey said that the fact that a disease gene is more likely to interact with another disease-causing gene than with a gene that is not yet known to cause disease could help find future drug targets, as long as at least one disease-associated protein is already known.
"People are looking for guidance. They say, 'If I know one gene, for example an ion channel, is defective in some patients with hypertension, how do I find the other genes that are involved?'" said Pandey. "You can use a protein-protein interaction approach to narrow down the candidates from hundreds of genes to maybe three or five genes. It's a way to prioritize and to find the next drug interaction target."
"Each gene and protein should be individually valued. It doesn't take hundreds of interactions for a protein to be very important. That is the take-home message."
Pandey and his research team came to their conclusion after analyzing the Human Protein Reference Database and a knockout mice database, and analyzing high-throughput worm, fly, and yeast interaction studies.
The HPRD is a database of information on human proteins collected from published papers curated by a team of about 50 to 70 scientists over the last four years. The scientists are based in the IOB, a non-profit institution in Bangalore that Pandey founded in 2002.
Information compiled in the HPRD includes the subcellular localization of proteins, protein interactions, protein post-translational modifications, the proteins' domain within the human body, and any disease associations of the proteins.
To figure out the correlation between the essentiality of a protein and the number of proteins with which it interacts, Pandey's team first compiled a long list of genes that when deleted in knockout mice result in death. Those genes were termed "essential."
Another set of genes were termed "non-essential" because they did not cause death when deleted in knock out mice,.
Pandey's team then went through the HPRD database to figure out how many interactions "essential" proteins have, compared with non-essential proteins.
The researchers found that essential proteins do not, in general, have more interactions than non-essential proteins.
"Each gene and protein should be individually valued. It doesn't take hundreds of interactions for a protein to be very important. That is the take-home message," said Pandey.
To analyze disease-associated proteins, Pandey's team took data from the HPRD, which includes 25,000 protein-protein interactions. They analyzed interactions among 1,077 genes linked to 3,133 diseases.
"When you have 25,000 interactions, that's a decent amount of data," said Pandey. "We were looking at the connections of these disease genes — who are they talking to?"
In addition to analyzing disease-associated proteins and essential vs. non-essential proteins, Pandey's team sought to find new human protein interactions based upon fly, worm, and yeast data.
The researchers compared two large datasets for fly and worm, each of which contained interactions from high-throughput yeast two-hybrid experiments. They looked for interactions that existed between the same two proteins in each of the two datasets.
"If a worm and a fly study independently have the same orthologous proteins captured in two independent screens, they are very prime candidates for being true," Pandey reasoned.
In the end, the researchers found 36 overlapping interactions out of many thousands of interactions that were analyzed. They then tested the human analogs of nine out of 36 of those interactions using an epitope-tagging approach and found that all nine did indeed interact in humans.
"We had a very high hit rate, and that is good," said Pandey. "It shows that at least our strategy of thinking is working. That is, if you observe two interactions in two different datasets, pay attention. It's very likely to be real."
They also found through the HPRD that the orthologs of seven of the 36 overlapping interactions had previously been shown to be part of complexes.
"People did not know that A and B directly touch each other, but they did know that A and B are in a complex," said Pandey.
The researchers also analyzed overlapping interactions between fly, worm, yeast, and human, and found that 16 interactions were common to all four species.
Pandey said he and his research team were surprised by how few interactions overlapped between species. Based on ongoing work comparing high-throughput Y2H data with curated data in the HPRD, which consists mostly of low-throughput biochemical experiments, Pandey's team believes that the low number of overlaps is due to a high number of false positives from Y2H experiments, as well as a difference in the types of proteins seen in Y2H studies and biochemical experiments, rather than due to fundamental differences in species.
"We have ongoing studies that have shown that human Y2H studies and human curated data studies also don't agree with each other and have a low extent of overlap," said Pandey. "In that case, you can not blame [the lack of overlap] on species difference."
Pandey said he and his research team expect to submit a paper within the next few months describing their analysis of human Y2H interactions and interactions from curated data. Much of the human Y2H interaction data comes from large datasets from Marc Vidal at the Dana Farber Cancer Institute, Eric Wanker at the Max Planck Institute, and Joel Bader at Johns Hopkins, he added.
The fact that there is not much overlap between high-throughput interaction studies and low-throughput studies does not mean that Y2H is not valid, Pandey emphasized.
"I don't want to trash Y2H. It is very useful because it's very quick," he said. "But we need many different complementary methods to be used. We can not do just one type of study to figure out the interactome."
— Tien-Shun Lee ([email protected])