CuraGen, with partners Wayne State University and Yale University, published a protein-interaction map of the Drosophila proteome in Thursday’s online edition of Science.
The map, which co-author John Chant, head of genomics and proteomics at CuraGen, called “the first proteomic map of a higher eukaryotic organism,” was created through a combination of yeast-2-hybrid experiments and extensive informatics modeling, and constituted a “prelude and template” for a parallel human map that is in development at CuraGen but which the company has not released, according to Chant.
“I think other pharma companies will be very interested in this information. … For computational biologists, this is a tremendous data set,” Chant told ProteoMonitor in regards to the Drosophila map.
Chant and his collaborators created the map, which describes 4,679 proteins and 4,780 interactions, by first screening 10,623 predicted transcripts to produce a draft map of 7,048 proteins and 20,405 interactions. The draft was then run through a “confidence” screen based on a training set of interactions that were reviewed by an “expert biologist,” who rated each theoretical protein interaction based on its feasibility. The resulting program assigned a level of confidence to each interaction in the draft.
A revised map of high-confidence interactions was then created, and analyzed to see which classes of proteins were enriched or depleted. The specific global and local pathways were analyzed as well, with local pathways typed into classes such as “transcription pathway” or “signal transduction pathway.” The interactions have been deposited with FlyBase, BIND, and DIP.
“The type of information we have [in the confidence program] is how many times we got the same hit, did the hit go in both directions in terms of the 2-hybrid, and what was the overall behavior of the bait and prey — was it promiscuous or not? The other thing we looked at is the network geometry,” Chant said. “All of these things intuitively make common sense and people have alluded to these things in terms of pathways and 2-hybrid before, but we in an unbiased fashion developed algorithms to use this information and assess which [interactions] were truly useful.”
Chant acknowledged that yeast-2-hybrid screens “have gotten a bad name over the years.” The artificial-ity of the system — which requires overexpression of proteins, lack of cellular context, and the introduction of reporters — commonly leads to many false positives. The informatics approach was designed to minimize that, and Chant said that several proof-of-concept studies have shown that it is robust. “It’s a major advance, and we’re pretty comfortable with [the results],” he said.
CuraGen’s use of a combination of yeast-2-hybrid screens and extensive informatics illustrates the systems biology approach that has gained momentum recently in the proteomics and genomics spheres. Systems biology is “absolutely” a focus for CuraGen, said Chant. “Understanding systems in their entirety and having as much information as possible is definitely our goal,” he said. High-throughput yeast-2-hybrid assays and bioinformatics are two technologies that CuraGen is known for — an important reason, according to Chant, that his group undertook this particular project.
Another reason, of course, is the potential for eventual commercial gain. “CuraGen historically has looked at the human genome and found novel druggable targets,” Chant said. “As such, there are orphan drug targets that are druggable, but we don’t know what they’ll do. So by putting these in the context of the pathways in the interaction map, you can put these orphan drug targets into the context of human health.”
Although the Drosophila map is a first step, CuraGen has already shifted its focus to the human map that the company is developing, with an eye to “pathways that have to do with human health.” Chant said that the company has already used the human data it has collected so far to find drug targets, and that ultimately, the focus will be downstream on the clinic. “Really our goal is to use genomics and proteomics not for target discovery, but for managing our clinical programs down the line,” Chant said.