Scientists led by Stockholm Royal Institute of Technology (KTH) researcher Mathias Uhlén have released a new version of the Human Protein Atlas database.
The new release – the twelfth edition of the HPA – contains antibody-based protein data for more than 80 percent of the human protein-coding genes as well as RNA expression data for more than 90 percent of these genes.
The atlas has also been restructured as four sub-atlases: the Normal Tissue Atlas, which contains information on protein expression in normal tissues; the Subcellular Atlas, which provides data on protein subcellular localization; the Cancer Atlas, which covers protein expression across different types of cancer; and the Cell Line Atlas, which contains protein expression information for different cell lines.
A paper detailing the new release was published last week in Molecular & Cellular Proteomics.
Launched in 2003 out of Uhlén's lab at KTH, the HPA project has grown significantly since then and currently involves more than 150 researchers across 17 countries. The effort is one of the primary pillars of the Human Proteome Organization, providing a catalog of protein expression and localization data supporting a variety of HUPO initiatives, including the ongoing Chromosome-Centric Human Proteome Project. The atlas currently contains more than 13 million images of protein profiling in 46 different human tissues along with RNA-seq data for 27 of these tissues.
One notable finding to emerge from the project's integration of its proteomic and transcriptomic data, Uhlén said, is that these two levels of information appear to be well correlated, a fact that he said suggests that RNA-seq could prove an effective tool for validating proteomic discoveries.
"This question [of correlation between RNA and protein expression] is very interesting, and there are a lot of conflicting results in the literature [about it]," Uhlén told ProteoMonitor.
Based on the HPA data, "we claim, in general, that if you look at differences in mRNA levels between two tissues or two cell lines, they correlate quite well with the protein data," he said.
"Of course," he noted, "there will be exceptions to the rule, but we are very encouraged by the correlation that we see."
The HPA work also serves to confirm recent mass spec findings, Uhlén said. Over the last several years, a number of proteomics labs have published studies in which they have appeared to saturate the proteome detectable via mass spec. These studies have generally identified in the range of 11,000 to 13,000 proteins, suggesting that this is roughly the number of individual proteins expressed in a typical cell.
The HPA researchers have arrived at a similar number in their antibody-based work, Uhlén said.
"The mass spec community is starting to be able to go down very deep into the proteome [of] cell lines and tissues," he said. "And what we are seeing [in the HPA work] is a very similar picture – about 13,000 or 14,000 proteins or gene products expressed in cell lines or in tissues."
In addition to cataloguing and mapping the human proteome, Uhlén and his HPA colleagues are also focused on antibody validation. As part of the project, the researchers have issued an open call to antibody providers, offering to validate their reagents using the HPA pipeline.
To date, Uhlén said, the team has validated around 14,000 outside antibodies. For those reagents that they find work well, the researchers publish the validation data along with links back to the provider.
Perhaps unsurprisingly given the field's struggle with poorly performing — and in some cases outright fraudulent — antibodies, the majority of the reagents Uhlén and his group test don't, in fact, work, he said. In a 2009 analysis of antibodies from around 30 commercial providers, the researchers determined that roughly four out of 10 antibodies sent to them performed well. Since then, Uhlén said, that has dropped to fewer than three in 10.
This decline likely isn't, however, a sign of declining standards within the industry, he said, but rather a reflection of the fact that in the first years the providers sent in their more thoroughly validated products for testing.
"I don't think it's getting worse, I just think that it is not the star antibodies that we are testing now," he said.
Also with an eye towards helping researchers identify quality affinity reagents, Uhlén in 2009 launched the Antibodypedia data portal, which compiles vendor and researcher data on antibodies to various targets. Currently, he said, the portal contains data on more than 1 million antibodies from more than 60 providers, including data from roughly 500,000 primary validation experiments.
The bulk of this information has come from commercial vendors, he said, noting that despite much talk of the importance of antibody validation, the research community has lagged well behind in terms of contributing such data.
"So far it has been very difficult to attract the [research] community to provide validation data," Uhlén said. "We need to work with the community to understand how we can get [researchers] to do what everyone says that they want to do — share the experience of using particular antibodies to particular gene products."
Moving forward, the HPA project has three main goals for the next two years, Uhlén said. The first is to use the recently integrated RNA-seq data to clean up the protein data.
"We can now very directly compare the RNA expression and protein expression, and we will use that as a tool to make a better quality atlas," he said.
The second is to add more antibody data to the subcellular atlas, which, Uhlén noted, can't be effectively analyzed via RNA-seq due to the fact that RNA expression data "doesn't tell you where the protein [ultimately] goes."
The third goal is the release of a mouse brain protein atlas, a project the researchers hope to complete by the end of 2014.
The project is funded primarily by the Knut and Alice Wallenberg Foundation, a private charitable organization that supports Swedish scientific research.