MADRID (GenomeWeb) – With the proteomics field closing in – at least by some measures – on its long-held goal of characterizing the complete human proteome, the question of how to confidently identify and validate the proteins still outstanding received much discussion this week at the Human Proteome Organization's 13th annual meeting in Madrid.
This discussion was in no small part sparked by a pair of studies recently published in Nature – one led by Johns Hopkins University researcher Akhilesh Pandey and the other led by Technical University of Munich researcher Bernhard Kuster – that separately presented the two first relatively comprehensive maps of the human proteome.
Though in theory something of a landmark for the field, the papers have been called into question by a number of outside researchers, in particular a group from the Spanish National Cancer Research Centre (CNIO), who in July published a critique in the Journal of Proteome Research suggesting that the two studies may have substantially overstated the number of quality protein identifications supported by their data.
The debate over the papers continued this week, as Kuster and Pandey both presented on their findings at the meeting. Additionally, University of Michigan researcher Gil Omenn, chair of HUPO's Human Proteome Project (HPP), spent a portion of his presentation updating the progress of the HPP to critique the Kuster and Pandey papers, noting the various questions that have been leveled at the groups by outside researchers.
For instance, as Paul Rudnick, formerly a researcher at the National Institute of Standards and Technology and currently the owner of proteomics informatics firm Spectragen-Informatics, told ProteoMonitor upon publication of the CNIO team's critique, the Pandey study used peptides as short as six amino acids to make protein IDs, which can lead to bad identifications.
More generally, Rudnick noted, the difficulty of working with large datasets like those compiled in the Pandey and Kuster studies is that, after a certain level of proteome coverage is reached, new identifications are all very likely to be false positives.
This issue was likewise noted this week by Swiss Federal Institute of Technology Zurich researcher Ruedi Aebersold, who observed that the field has reached "the phase where there is quasi saturation of shotgun [mass spec] discovered proteins."
When dealing with such large, saturated datasets, confidently claiming new identifications is statistically difficult, he said, because these identifications will typically be within the margins of error.
This, Aebersold said, raises the question of how the community should proceed in confidently identifying the proteins still to be detected as it seeks to compile a complete human proteome.
Pandey, in his presentation this week, suggested that the field is still somewhat far off from reaching a consensus about this issue. He questioned why his and Kuster's papers have received a relatively large amount of criticism given that, "there are hundreds of papers using the methodologies that we used." He added, "I think we are all going to learn as a community that these are intermediate milestones."
"None of this is close to being sorted out. Nothing is set in stone," he added. "So I think we need to get together and start to critically analyze the data, including the methodologies, and we should realize that every method has its caveats whether it is advertised or not."
One possible way forward, Aebersold noted, is to move to targeted mass spec approaches for confirming the presence of identified proteins, making synthetic peptides to putative proteins and then comparing to see if the spectra of the synthetic peptides match those of the identified endogenous peptides.
Pandey's team, in fact, used this approach to confirm a number of its findings in the Nature paper and, indeed, he told ProteoMonitor at the time that he believed such a process should become standard for validating unusual matches.
In total, the Pandey-led study identified proteins coded by 17,294 genes, or roughly 84 percent of the 20,493 human genes annotated in UniProt as protein coding. This number includes proteins to 2,535 genes for which there was previously no protein evidence.
The Kuster-led project detected proteins to 18,097 human genes, approximately 88 percent of the protein-coding genome. It also detected 19,376 of the 86,771 protein isoforms currently listed in UniProt.
HUPO's Chromosome-Centric Human Proteome Project, meanwhile, has winnowed the number of proteins missing from its mapping project to 2,948, said Young-Ki Paik, director of the Yonsei Proteome Research Center in Seoul, Korea and one of the leaders of the C-HPP. This is down from the roughly 3,500 to 4,000 proteins outstanding as of the 2013 HUPO meeting and the roughly 6,000 outstanding as of the 2012 meeting.
Membrane proteins comprise a significant portion of the proteins thus far undetected by proteomic profiling projects, noted Boston University researcher Catherine Costello, suggesting that the field invest more effort in analysis of these targets. While membrane proteins are highly significant biologically, they are difficult to prepare for mass spec analysis via conventional workflows, and so have received relatively little attention from the field.
Despite the questions and uncertainty regarding how to proceed in characterizing the last portions of the human proteome, there were also this week suggestions of proteomics' potential clinical impact – even in the absence of a fully mapped human proteome.
For instance, Swiss Federal Institute of Technology in Lausanne researcher Johan Auwerx presented on work combining transcriptomics, metabolomics, and targeted proteomics to identify the molecular underpinnings of mouse metabolic phenotypes.
And MD Anderson Cancer Center researcher Sam Hanash announced that he and his colleagues have begun a 10,000-patient clinical trial to validate protein biomarkers for lung cancer that they plan to submit to the US Food and Drug Administration.
Discovered via profiling of patient plasma using isobaric tagging combined with ion mobility-based mass spec analysis on a Waters Synapt G2-S instrument, the markers could prove useful for a variety of purposes, including screening high-risk patients, helping to interpret CT scans, or allowing use of lower-dose CT scans, Hanash said.
In addition to the 10,000-patient US trial, he and his colleagues also plan to look at 10,000 patients from China, Germany, Brazil, and France.
While proteomics has famously struggled in its move towards the clinic, with current technologies, identifying protein biomarkers by "directly profiling patient plasma" is "eminently feasible," Hanash said.