Scientists from Lawrence Berkeley National Laboratory and Harvard Medical School have completed one of the largest protein interaction maps to date, identifying 556 protein complexes consisting of nearly 5,000 Drosophila melanogaster proteins.
Published last week in Cell, the study provides a tool for investigating the roughly one-third of predicted Drosophila proteins still without functional annotation and for generating hypotheses for future investigation, K.G. Guruharsha, a Harvard researcher and first author on the paper, told ProteoMonitor.
The Drosophila study is the second large protein complex network to appear in Cell in recent months, following the publication in the journal's May edition of a paper by Baylor College of Medicine scientists mapping 3,290 affinity-purified protein complexes from HeLa S3 that identified more than 11,000 proteins involved in regulating gene expression (PM 6/3/2011).
While both studies used co-affinity purification followed by mass spectrometry to isolate and identify protein complexes, the Berkeley-Harvard effort differed from the Baylor work by using affinity-tagged recombinant proteins as baits, instead of pulling down complexes by targeting native proteins with antibodies.
This approach allowed the researchers to achieve wide coverage of the Drosophila proteome without the expensive and time-consuming work of obtaining antibodies for all the targets of interest, said Robert Obar, director of Drosophila Proteomics at HMS and an author on the paper.
"I think it's fair to say that the [affinity-tagging approach] lends itself to high-throughput parallel processing much more than using a different antibody for each different experiment," he told ProteoMonitor. "It's a lot less expensive than creating and characterizing and validating antibodies for many different proteins."
Roughly 20 percent of the clones used were unable to successfully express the tagged bait proteins at detectable levels, but, Obar said, "we were willing to take a hit in some proteins not being amenable to our process in order to get things in higher throughput and more in parallel."
The method's high throughput allowed the researchers to put together a massive interaction dataset, which, Obar noted, enabled them to "do very high-quality informatics" to tease out more subtle or obscure interaction patterns.
For instance, he said "even if a protein that we're trying to find interactors for with a given experiment is a very abundant protein or a very sticky protein that tends to show up with background as a non-specific contaminant, we can actually detect that protein and its interactors by doing a very large number of experiments to distinguish where the background lies and where the cutoff is between background and foreground."
The researchers built the affinity-tagged proteins using clones developed by the Berkeley Drosophila Genome Project, which is co-directed by Berkeley researcher Susan Celniker, also an author on the paper. This collection of affinity-tagged clones is now maintained as part of the BDGP and named the Universal Proteomics Resource.
Mass spec work for the project was done in Harvard researcher Steven Gygi's lab, which used a Thermo Scientific LTQ XL instrument to identify the co-purified complexes.
The map, Guruharsha noted, complements a previous Drosophila protein interaction map developed using yeast two-hybrid systems. While yeast two-hybrid screens are good at identifying more transient and binary protein-protein interactions, affinity co-purification offers a better look at how multiple proteins come together in complexes, he said.
This feature, combined with the vast size of the dataset generated, allowed the researchers to identify even complexes containing no bait proteins, he added.
"We have complexes in our map that are exclusively made up of proteins that were never used as a bait," Guruharsha said. "Because they come down as part of other bait purifications and [because] our analysis looks at both bait-prey and prey-prey interactions within the entire dataset, we were able to identify those complexes with high confidence as well."
To distinguish genuine interactors from nonspecific interactors, the researchers developed a scoring system based on the hypergeometric probability distribution. They further analyzed these identified interactions using the Markov clustering algorithm.
The researchers evaluated the quality of their map by using their methods to analyze two previously well characterized complexes – the proteasome and the SNARE complexes – and found that their analyses confirmed previously reported interactions. They also examined the extent to which members of identified complexes shared Gene Ontology annotation and whether the genes encoding proteins in a complex tend to be coexpressed.
They performed cross-species validation, as well, using orthologous HA-tagged human proteins as baits in HEK293 cells, validating 58 of 114 interactions originally defined in Drosophila. This, the authors noted, suggests "the value of the [Drosophila protein interaction map] as a reliable resource for biological hypothesis in human cells."
Data from the study is stored at Harvard as well as in the open access FlyBase database maintained by teams at Harvard, University of Cambridge, Indiana University, and the University of New Mexico.
Over the five-year course of the project, the researchers have been regularly releasing their raw mass spec data to FlyBase, Obar said. "We realized it would be a good central location where people could look for information on their protein or gene of interest, including the interaction data," he explained.
They are now in talks with the FlyBase curators to explore what new gene or protein annotations data from the project might provide, Obar added.
"Our database might contain a lot of spectral information on proteins and peptides that have never been seen in [Drosophila] before," he said. "So we are in discussion [with FlyBase] to see how we can map the peptide data we've seen from the mass spec back to the genome to see if it improves annotation."
Moving forward, Guruharsha said, the researchers plan to use the map to study several pathways of interest, with Notch signaling being the first area of investigation.
"Our lab has done a lot of genetic screens looking at genetic modifiers to Notch signaling," he said. "So we want to put [these genes] in the context of a network, to overlap these genes onto the [protein interaction] map we just published and look at the interactions between genetically validated Notch signaling modifiers."
"Now that we actually have a network, it produces a lot of hypotheses," Obar said. "One of them is just the general question of what happens if you perturb the network in [a certain] place – what are the responses in the network? The Notch pathway is one we have a lot of experience with and one we're using to generate very specific hypotheses about how different downstream and upstream effects are felt when you perturb different parts of the system."
Have topics you'd like to see covered in ProteoMonitor? Contact the editor at abonislawski [at] genomeweb [.] com.