This is the last of four articles surveying leading proteomics researchers about the most notable achievements in proteomics during the 2010s. Part 1 can be found here, part 2 here, and part 3 here.
NEW YORK – At the beginning of 2010, members of the Human Proteome Organization (HUPO) met in Seattle to develop an international initiative to map the full human proteome.
Out of that effort emerged the Human Proteome Project (HPP) and then the Chromosome-Centric Human Proteome Project (C-HPP) and the Biology/Disease-Driven Human Proteome Project (B/D-HPP), which, to date, have managed to identify proteins from roughly 90 percent of all predicted protein-coding human genes.
Gilbert Omenn, professor of human genetics at the University of Michigan and one of the leaders of the HPP, identified the effort as one of the key developments in proteomics over the last decade.
While the project's goal of mapping the human proteome was a substantial endeavor, its most important legacy is perhaps the tools and processes developed in pursuit of this goal. Similar things could be said of several other of the decade's major proteomic initiatives, like the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) and the Swedem-based Human Protein Atlas, both of which were cited by researchers as being among the most significant efforts in proteomics during the 2010s.
"Crucial contributions to proteomics during the decade 2010-2019 have come from the HUPO Human Proteome Project," Omenn said. "The HPP mobilized a global effort to identify and characterize the protein products from each protein-coding gene, stimulated data sharing through creation of the ProteomeXchange at EBI/PRIDE to register proteomics studies, brought together the PeptideAtlas in Seattle and neXtProt in Geneva to uniformly re-analyze all publicly-available human proteomics mass spectrometry datasets with much-needed guidelines for credible detection and curation, [and] helped stimulate new instrument development and deployment in numerous biological studies."
Young-Ki Paik, a professor of biochemistry at South Korea's Yonsei University and a leader of the C-HPP effort, likewise highlighted the HPP's contributions.
"In my own personal view, one of the most significant developments in proteomics could be the introduction of the new workflow and team approach for human proteome annotation in a genome-wide fashion [introduced by the HPP]," he said.
Since it launched in September of 2010, the C-HPP has been the most prominent of the HPP initiatives. The project was initially conceived as a way to provide more definition and an endpoint to the various proteome mapping projects HUPO was pursuing in the years prior. It called for the creation of teams organized by country, with each country or team of countries adopting one or more of the 23 human chromosomes and characterizing one representative protein for each gene located on the chromosome.
In addition to identifying these proteins, the teams sequenced each protein, identified proteotypic peptides and antibodies that could be used to isolate it, and investigated its roles in different disease states.
As of the 2012 HUPO meeting, the initiative had identified around 14,000 of the presumed 20,000 proteins in the human proteome. By 2016, that number had risen to around 16,000, and as of 2019 it stood at 17,694.
In an interview last year, Omenn suggested that the effort has recently reached a stage where the remaining proteins outstanding could prove particularly difficult to find. Some might be expressed only under very specific conditions that are difficult to capture. Others might be embedded in membranes and be difficult to solubilize. Still others might not have the lysine and arginine residues required for the tryptic digestion commonly used in proteomic workflows.
Mark Baker, professor of proteomics at Macquarie University and a leader of the HPP, said that in the present decade, the project will work to define itself in terms of looking more closely at the proteins' biological functions in health ad disease. Like Omenn and Paik, he said that he suspected that "the highest long-term impact will come from the knowledge generated through [the HPP]," and cited the development of standards for assessing the quality of proteomics data, and tools for communicating their findings, as key achievements driven by the initiative.
Baker also noted the collaboration of the C-HPP with the Human Protein Atlas (HPA), an antibody-driven project that likewise aims to catalogue the human proteome, though with more of an emphasis on the localization of proteins in specific tissues and cell types.
Launched in 2003 with funding from the Knut and Alice Wallenberg foundation, the project had identified proteins from just over 10,000 genes at the beginning of the last decade and currently has information on proteins from just over 17,000 genes. The project has also validated 26,371 antibodies to these proteins, providing a resource where researchers can find validation and performance data on a large collection of protein affinity agents.
Stanford University professor Michael Snyder cited the HPA as one of the decade's highlights, saying that in providing a "first-generation map of the expression of many human proteins" it has "greatly extended the work done on model organisms [like yeast] two decades earlier."
Mathias Uhlén, professor at the Science for Life Laboratory at the Karolinska Institute and Royal Institute of Technology Stockholm and leader of the HPA also chose the advances in proteome mapping made by initiatives like the HPP and HPA as the key development of the last 10 years.
"The number of protein-coding genes has in the last 10 years been defined to slightly less than 20,000. In the same period, the number of antibodies towards the human proteins has exploded," he said. "Together with advances in transcriptome analysis and mass spectrometry, this has led to evidence at the protein or transcriptional level for more than 99 percent of these protein-coding genes. A large number of mapping efforts, including the Human Protein Atlas, have allowed a more holistic view of the human proteome and its constituents in cells, tissues, organs, and body fluids."
While the HPP and HPA are concerned primarily with identifying and characterizing the proteins comprising the human proteome, another of the decade's major initiatives, the CPTAC project, took as its mission the development of technologies for the clinical application of proteomics.
The project launched in 2006 as the Clinical Proteomic Technologies for Cancer (CPTC) initiative with a focus on exploring and correcting the experimental variability issues that had emerged as major challenges in proteomics. The second stage of the project launched in 2011 with a new name, the Clinical Proteomic Tumor Analysis Consortium (CPTAC), and a new focus: combining discovery and targeted proteomic data from tumor tissue samples with genomic characterizations of those same samples by the NCI-funded Cancer Genome Atlas.
That marked one of the first major forays into proteogenomics, which Henry Rodriguez, director of the Office of Cancer Clinical Proteomics Research at the NCI and head of the CPTAC effort, cited as one of the major applications advances in proteomics over the last 10 years.
He also highlighted CPTAC's role in driving this emerging discipline, citing the initiative for having "established experimental standards in proteomic, analytical, and computational workflows; pioneered the integration of proteomics with genomics to produce a more unified understanding of cancer biology; and implemented its application and transition into cancer clinical trials."
Rodriguez added that the project has "helped NCI to develop some of the world’s largest open‐access public repositories of multi‐omics data sets [DNA, RNA, proteins, and images], fit‐for‐purpose proteomic assays, and cancer‐specific antibodies and that CPTAC researchers and collaborators have "released proteogenomic datasets from greater than 10 cancer types and counting."
The CPTAC project carries on into this decade, with its third stage, launched in 2017, ongoing. Proteogenomics continues to be at the core of the work, but the initiative has shifted into more translational research, with participants using proteogenomic data to better understand patients' drug response and the development of resistance, with the ultimate goal of employing assays developed through the effort in clinical trials.