The Human Proteome Organization's Chromosome-Centric Human Proteome Project, or C-HPP, expects to achieve its goal of mapping and characterizing the roughly 20,000 proteins in the human proteome by the end of 2014, project leader William Hancock told ProteoMonitor this week.
There currently remain roughly 3,500 to 4,000 "missing" proteins for the project to characterize, said Hancock, chair in bioanalytical chemistry at Northeastern University and editor of the Journal of Proteome Research. This is down from the roughly 6,000 proteins the C-HPP estimated were outstanding as of the 2012 HUPO meeting.
"The numbers are not quite rock solid," Hancock said, noting that as efforts like the National Human Genome Research Institute's Encyclopedia of DNA Elements Consortium identify new genomic elements, the proteomic picture will likely shift, as well. But, he added, "the basic picture is that by the end of the next year, most of those [3,500 to 4,000 proteins] will no longer be missing."
With that portion of the project complete, the participating researchers will focus on characterizing at least one alternative splice variant for each protein, as well as phsophorylated, glycosylated, and acetylated forms. Following that, Hancock said, the researchers will "limber up to the big challenge of non-synonymous SNPs, and [determining] which of those actually make it to the protein level."
First proposed at the 2010 HUPO meeting (PM 2/18/2011), the C-HPP formally launched at the same meeting two years later (PM 9/14/2012). The project calls for participating countries to take one of the human chromosomes and characterize one representative protein for each gene located on the chromosome.
In addition to Hancock, the C-HPP is led by Young-Ki Paik, director of the Yonsei Proteome Research Center in Seoul, Korea; and Gyorgy Marko-Varga, a Lund University professor and AstraZeneca principal scientist.
The effort consists of two stages. The first, which is slated to run through 2018, will focus primarily on mapping and characterizing the roughly 3,500 to 4,000 proteins yet to be detected via mass spec, as well as post-translational modifications, alternative splicing transcripts, and non-synonymous SNPs of the roughly 14,300 well-characterized proteins. The second phase, which is planned to run from 2018 to 2022, will focus primarily on validating data from the first phase along with functional studies and developing drug targets and biomarker candidates.
Regarding his expectation that the project would close out its search for missing proteins by the end of next year, Hancock noted that although the goal was to identify these proteins via mass spec, there would be a subset not amenable to such analysis. The number of such proteins could potentially reach as high at 1,000 "if you take into account unusual sub-cellular localizations," he said, adding that for these targets the researchers hoped to collaborate with Royal Institute of Technology Sweden researcher Mathias Uhlen, who is working through his Human Protein Atlas project to validate antibodies to the entire human proteome.
The HPA currently contains information on more than 18,000 validated antibodies to 15,000 gene products, which corresponds to 75 percent of the protein-encoding genes in humans. Combining this antibody data with transcriptomic data, the effort has developed a map of proteins across all major human organs and tissues.
The C-HPP is also continuing to pursue collaboration with ENCODE researchers, Hancock said. Last year, he and his project co-chairs published a paper in Nature Biotechnology suggesting that such a partnership could help unravel how the actions and interactions of the genomic elements identified via the ENCODE initiative are manifested at the protein level (PM 11/30/2012).
In an email to ProteoMonitor, Paik noted that the C-HPP heads and ENCODE Technical Project Manager Kate Rosenbloom discussed ways of potentially enhancing the exchange of data between the two groups. He added that the Spanish team, which is investigating chromosome 16, has established its own data pipeline to allow it to make better use of ENCODE data.
While the C-HPP continues to make progress toward its goals, money remains a persistent concern for some researchers, as the lack of specific ties to disease or biological questions has made it difficult in some cases to win funding.
Hancock, suggested, though, that while finding funding specifically for C-HPP work might be challenging, researchers are able to generate data for the project through their funded work on specific biological questions.
He cited a paper he and the chromosome 17 team authored in JPR this year looking at ERBB2 and related proteins in breast cancer.
When you publish a paper "you contact all the corresponding authors and ask them for their grants to acknowledge them," he said. "And I think the funding for that work exceeded $100 million."
"So it can't be said that the [National Institutes of Health] isn't supporting this, because they are – they are supporting the disease and biology and mass spec studies, all of which are contributing data to the [C-HPP], he added.
For instance, "I've been funded to do breast cancer research," Hancock said. "And a focal point has been ERBB2. But there are a bunch of [proteins] that are synergistic [with ERBB2] in the cancer process and some of those are uncharacterized."
"So that leads into, 'Well, to study the cancer process we need to find out more about these [proteins], and so we're also contributing to the missing protein list," he said. "So we're doing disease and biology, and we're doing [the C-HPP], and you move back in a very synergistic way."