Skip to main content

Human Proteome Project Revisited at HUPO; Methodology, Funding, Data Management Still Open Questions


This story originally ran on Sept. 28.

By Tony Fong

TORONTO — More than a year after a project to map out the human proteome was first proposed, members of the Human Proteome Project and attendees of the Human Proteome Organization's annual conference, which began here this past weekend, revisited the merits, challenges, and advisability of such a project.

In April 2008, HUPO first publicized its plans to launch the ambitious initiative as a follow-up to the Human Genome Project in order to "fill in the void between the genotype and the phenotype," a HUPO official told ProteoMonitor at the time [See PM 05/01/08].

The project was estimated then to last a decade and cost $1 billion. But indications that it would have trouble getting off the ground became clear at last year's HUPO conference when representatives from a wide swath of funding agencies, while appreciative of its ambitions, expressed skepticism about the price tag and its chances of attracting funding [See PM 08/21/08].

In the year since, not much has been publicly heard about the Human Proteome Project, until this week when organizers of the HUPO conference and some of the leading voices in proteomics debated how to best approach such an effort.

A session held on Saturday night to restart a dialogue on HPP kicked off with John Bergeron, a past president of HUPO and a professor of anatomy and cell biology at McGill University, saying that, in fact, HPP had already begun with Amos Bairoch's annotation and curation of more than 20,300 protein-coding genes a year ago at Swiss-Prot [See PM 09/04/08] and Matthias Uhlen's continuing work on the Human Protein Atlas.

Bairoch, formerly director of the Swiss-Prot database, is now head of a new resource called Computer and Laboratory Investigation of Proteins of Human Origin. Uhlen is president and vice president of microbiology at the KTH Royal Institute of Technology.

The phase that has not begun, and the aspect that remains undetermined, is how the mass spectrometry community and the quantitative data it can provide fits into the picture, Bergeron said.

But even before that, what kind of approach HPP should take has not yet been decided. The two main choices being considered are a gene-centric approach and a sample-centric focus, though at the session some made a plea also not to overlook certain chromosomes when and if work on HPP begins.

Most of the speakers on Saturday firmly threw their weight behind a gene-centric approach, which would encompass the determination of all isoforms, splice variants, and post-translational modifications of every protein encoded by the human genome. While such a strategy "is not very easy" because quantitative information will be needed, the potential benefits of such an approach could be profound, and what would be created would be of use not only to proteomics researchers and clinicians but also to researchers outside of the protein sciences, Uhlen said.

Pierre Legrain, secretary general of HUPO and director of life sciences at the French Commissariat à l’Énergie Atomique, also advocated for a gene-centric strategy. One goal of the HPP should be to accurately detect "any protein in any biological sample … with a final output to relate genotypes and phenotypes, or disease states, to the presence or absence of a protein or set of proteins."

A database should compile all "certified" mass-spec data for all proteins, he added.

According to Legrain, the human proteome is divided into three groups – proteins whose identifications and functions are well known; those that may not be as well known but have been annotated and described; and proteins that are poorly known or not known at all.

[ pagebreak ]

The well-known proteins should serve as reference proteins in biological samples and their mass-spec data should be integrated into a deep annotation of the whole proteome, Legrain said. Meanwhile, basic knowledge of those proteins that have been annotated and described, and their variants, should be distributed and their validated mass-spec signatures should be published.

The final group, those proteins whose existence and/or nature remains a mystery, is where the HPP should direct its efforts, Legrain said. Their expected peptides should be defined and initial evidence of their existence be shown. Their distribution should be confirmed and characterized and then reported.

Not all supported an approach that would tie HPP's results to the genome, however. William Hancock, a professor of protein and analytical chemistry at Northeastern University, was one of the few who publicly argued for a sample-focused approach, which would map proteins and their variants in specific tissues in a disease-specific manner, saying that such a substantial investment in technology should result in insights in disease rather than a list of proteins and their variants.

And Mark Baker, a professor of chemistry and biomolecular sciences at Macquarie University and director of the Australian Proteome Analysis Facility, while not throwing his weight behind either the gene-specific or sample-specific debate, said that ultimately, the beneficiaries of a project like HPP should not be researchers, but the taxpayers who would be paying for it, in the form of better disease diagnosis, prognosis, and treatment and management.

Funding Challenges

No matter what form HPP will take, however, its organizers still need to convince the funding community to bankroll it, and one year after funders gave the initiative a lukewarm assessment, their response is, at best, unclear. Patrick Kolar, in charge of genomics and systems biology funding for the European Commission, said that compared to a year ago, this weekend's session was a "positive discussion" with more people involved, more pointed questions asked, and details about business models being part of the dialogue.

He stopped short of saying money would be coming soon, though.

Sudhir Srivastava, chief of the cancer biomarkers research group, division of cancer prevention at the National Cancer Institute, who during last year's meeting, questioned any chances of HPP getting funded anywhere near the $1 billion level, this year again expressed concerns about the project and how proteomics is perceived by funding agencies.

Proteomics is still doing a poor job of convincing the broader community of its merits, and "we are not selling [it] to the funders very well," Srivastava said. In such an economic environment, especially, funding agencies need to be convinced about the benefits and desirability of such a large-scale and long-term project, he added.

There are numerous aspects about proteomics that are of concern to funders, he said, including the questionable return on investment for large-scale projects in the field; the lack of a business model; questions about its enduring value —for example, whether it will help treat disease; and management of the data.

One recommendation he made was to have a summit of stakeholders, such as funders, vendors, and researchers, in order to discuss the many challenges and ways of addressing them. And the HPP, he said, needs to build on the successes of existing HUPO initiatives such as projects looking at proteins in blood, the brain, liver, and kidney.

HPP and its organizers also need to develop a comprehensive approach to map disease-specific proteins, continue work in developing standards, and work with the health profession in order to expedite translation of proteomics from the research setting to the clinic, Srivastava said.

Mike Snyder, a professor of genetics at the Stanford University School of Medicine, supported the idea of a proteomics summit. Drawing parallels to what happened in genomics, he said that at one time, people pooh-poohed the idea that the human genome could be sequenced, but once funding agencies got together with leaders in the genomics field and set a course on how it could be done, it laid the foundation for the Human Genome Project.

[ pagebreak ]

With so much work underway, several speakers said that there is no need, in essence, to reinvent the wheel, but rather, HPP needs to focus on what hasn't been achieved. In addition to identifying and characterizing unknown proteins, as Legrain pointed out, Tommy Nilsson, a professor of medicine at McGill University, pointed to the need for a physical repository for protein-relevant raw data to be deposited in a standardized manner.

While databases such as Tranche serve such a purpose now, a database specific to HPP will ensure that "what we have done and what we will do" won't become "roadkill," as much of the proteomics data has become, he said.

Such a repository may already exist though. In February, the ProteomeExchange consortium was launched as a single point of submission to proteomics repositories. ProteomeExchange's core members are PeptideAtlas, Tranche, and PRIDE, and it contains mass spec-specific data.

So what shape could HPP take in the next few years? Legrain offered the following agenda: Next year, a working group should be created devoted to developing an action plan for HPP. In 2011, the pilot phase of the initiative would be launched with a two-year timetable to get some initial results and conclusions. And in 2014, a full-size HPP would be launched.

HUPO's role would be to initiate the formation of a working group and support HPP outside of HUPO. While HUPO would have an important role in HPP, it would be important that it does not "own" it and would not have governance over it, some said during Saturday's meeting.

Though, in fact, proteomics is a vibrant research field and countless numbers of projects are being done, Snyder said that HPP has a place in the scheme of things.

"A centralized project will allow people to do things they can't by themselves," and would raise the standards of every proteomic researchers' own work, he said, just as the Human Genome Project did for genomics.

And in addition to deliverables, such as a set of reagents, HPP would be a catalyst to technology development. Again drawing on the Human Genome Project, he said that 20 years ago, people said that the technology then was not robust or advanced enough to map the human genome. Those people were wrong, obviously, and the subsequent success of the effort has led to leaps and bounds in genomics technology.

"I'd like to see that happen [with the HPP]," he said. "Here's what we'd like to do, let's see how we can do it."

The Scan

US Supports Patent Waivers

NPR reports that the Biden Administration has announced its support for waiving intellectual property protections for SARS-CoV-2 vaccines.

Vaccines Versus Variants

Two studies find the Pfizer-BioNTech SARS-CoV-2 vaccine to be effective against viral variants, and Moderna reports on booster shots to combat variants.

CRISPR for What Ails You

The Wall Street Journal writes that CRISPR-based therapies could someday be used to treat common conditions like heart attacks.

Nature Papers Review Integration of Single-Cell Assay Data, Present Approach to Detect Rare Variants

In Nature this week: review of ways to integrate data from single-cell assays, and more.