NEW YORK (GenomeWeb) – A team led by researchers at Sweden's Royal Institute of Technology (KTH) has demonstrated a gene-specific correlation between mRNA and protein expression.
Described in a paper published last week in Molecular Systems Biology, the findings shed light on a question that has long puzzled scientists and could allow researchers to use transcript levels as proxies for protein expression, bringing new utility both to previously generated and future transcriptomic datasets.
That genes are translated into RNA which then produces proteins is the so-called "central dogma" of molecular biology. One implication of the relationship between these molecules is that mRNA levels might be used to predict protein expression levels, which would be desirable given the relative ease and higher throughput of transcriptomic experiments compared to large-scale proteomics workflows.
In practice, however, researchers have struggled to establish a consistent relationship between mRNA and protein levels.
"There have been many very conflicting reports about the correlation between RNA and protein," Mathias Uhlén, professor of microbiology at KTH and senior author of the MSB study, told GenomeWeb. "Obviously we learned in school that from DNA you get mRNA you get protein, but [studies] have been conflicting as to whether you can use mRNA as a proxy for protein levels."
Quantitative analyses by Uhlén and his colleagues indicate that, in fact, transcripts can serve as proxies for protein expression in a gene-specific manner. Looking at absolute protein copy numbers and mRNA transcript numbers for 55 genes across a variety of human tissues and cell lines, the researchers found that these two levels of molecular information are well correlated but not in a straightforward one-to-one ratio.
Rather, for each gene there is a specific ratio of mRNA to protein that is maintained across cell types and conditions. Once established experimentally, these ratios can be used for predicting protein expression based on transcript levels.
"The beauty here is that this ratio between [transcripts and proteins] is independent of the cell line or tissue," Uhlén said. "Therefore, you don't have to do it in all tissues or cell types. You only have to do it in one."
Previous work has similarly suggested that it might be possible to identify consistent correlations between mRNA and protein. For instance, in 2013, researchers at the University of British Columbia published a study, also in MSB, that suggested taking into account the synthesis and degradation levels of proteins could enable better correlations between transcripts and proteins.
Leonard Foster, a researcher at UBC and an author of that study, noted at the time that while transcriptomic data alone "give you slightly more than half the data you need to make a good prediction" of protein expression, adding synthesis and degradation information could bring the accuracy of transcript-protein correlations into the 80 percent range.
"We think that if you could do a widescale measurement of protein [synthesis and degradation] rates, then those could be reasonably added to a database like Uniprot as sort of de facto rates for that protein," Foster said.
"While that might not be precise for every cell type, it would at least be a pretty good initial starting point if you need to use that data for layering on top of a transcriptomic measurement," he added. "Someone would have to do those [synthesis and degradation] measurements, but once they are measured, they could be widely applied."
Uhlén and his co-authors likewise noted the role of protein synthesis and degradation rates, writing that their results show that the RNA-to-protein (RTP) ratios they calculated can vary "hugely between different genes, suggesting that one mRNA molecule in some cases can generate close to a million protein copies at steady state, while mRNA from other genes generate [on average less than [a] thousand proteins under the same conditions."
"This is not surprising," they added, "since it is known that protein half‐lives can vary many orders of magnitude and that proteins also have different translational rates."
Importantly, as Foster and his colleagues also noted, these synthesis and degradation patterns and the resulting RTP ratios appear universal across tissue and cell type, meaning that they only have to be calculated once.
Even this is not a trivial task, though, Uhlén said, noting that in the MSB study, he and his colleagues had calculated RTP ratios for only 55 proteins. To make their protein measurements, the researchers developed parallel-reaction monitoring mass spec assays using stable isotope-labeled recombinant protein fragments. They determined cell copy numbers by using histone content, based on a method put forth in 2014 by researchers at the Max-Planck Institute of Biochemistry that uses the fixed proportion of histones and DNA in cells to calculate per-cell protein copy numbers.
Even with a complete set of RTP ratios, transcriptomic data would not provide information on matters like protein post-translational modifications or spatial organization, Uhlén noted, but the ability to accurately correlate transcripts and proteins would nonetheless be "very valuable for the community," he said.
"I think it would be a good idea to go through all the genes" and calculate their RTP ratios, he said. However, he added, his lab doesn't have the resources to undertake such a project on its own.
"We would love to do it, but the problem is, it is really a lot of work because you have to develop a targeted proteomics assay individually for every gene," he said. "It's quite cumbersome."
"We are absolutely suggesting that it be done by the community in some way, though," Uhlén said.
One possibility, he suggested, would be for a group like the Human Proteome Organization to take up the project, perhaps dividing the genome up amongst various interested members as part of the group's larger Human Proteome Project.