A team led by researchers from the University of California, San Francisco and the Max Planck Institute of Biochemistry has completed a proteomic characterization of the human cytomegalovirus using a combination of ribosome profiling and mass spec analysis.
Detailed in a paper published last week in Science, the study added considerably to current understandings of the HCMV proteome, discovering several hundred previously unidentified open reading frames and confirming a number of protein products from those ORFs via mass spectrometry.
The project also offers another example of the ongoing move to combine genomic and transcriptomic research with proteomics to better understand how genomic elements translate into biologic function.
"By and large, the information we care about when we get a genome is what proteins it is producing, because those proteins then go on to impact the function of the cell," Jonathan Weissman, a UCSF researcher and study author, told ProteoMonitor. In the case of HCMV, researchers have had the sequence of the virus' genome for roughly 20 years, he noted, but "haven't really known what proteins it is producing" due to its genomic complexity.
Weissman and his colleagues approached the HCMV proteome via ribosome profiling, a technique his lab developed that enables monitoring of protein translation on a genome-wide scale.
Essentially, the method uses an agent such as cyclohexamide to inhibit the mRNA-ribosome complexes performing protein synthesis. That is followed by ribonuclease digestion to isolate the portions of mRNA directly bound to the ribosome, which can then be sequenced to determine genome-wide exactly what amino acids are being translated at a given time.
The technique, Weissman said, lets researchers "catch [ribosomes] in the act of turning RNA into proteins," allowing them to "look directly, comprehensively, and with high precision both at what proteins are being produced and at what times they are being produced."
In the Science paper, Weissman and his collaborators used the technique to study temporal changes in the HCMV proteome, infecting human foreskin fibroblasts with the virus and profiling the ribosome-protected mRNA at 5, 24, and 72 hours after infection. They identified 751 total ORFs, more than triple the roughly 160 to 250 ORFs traditionally thought to comprise the HCMV genome.
Weissman said that although he anticipated the project would discover some previously unidentified proteins, he was surprised at just how many novel molecules they found.
"I think it was well appreciated in the field that our understanding of what proteins were likely to be made by this very complex virus was likely to be incomplete," he said. "But we were surprised by how much new there was."
As the researchers wrote, the virus's use of alternate transcript start sites lets it translate a number of different proteins "from a single genomic locus," with different start sites predominating at different stages of the infection process.
Weissman suggested that, while HCMV "might be a bit of an extreme case," similar complexity is likely present across all organisms. His lab's ribosome profiling approach, he added, is applicable to any kind of cell.
Of the newly identified ORFs, 245 were very short, consisting of 20 or fewer codons; 239 consisted of between 21 and 80 codons; and 120 consisted of more than 80 codons.
In addition to the ribosome profiling data, the researchers also looked for evidence of these newly identified proteins via mass spec, collaborating with the lab of Max Planck researcher Matthias Mann to do proteomic discovery work on a Thermo Fisher Scientific Q Exactive instrument.
Given the Weissman lab's ribosome profiling work, "it was kind of an obvious step to go to the proteome level and really look for these specific proteins that they were predicting," Annette Michalski, a Max Planck researcher and author on the paper, told ProteoMonitor.
Obtaining sufficient proteome depth via mass spec proved a challenge, however, she noted, particularly given that the HCMV proteins were present against a background of the infected human cell.
"These were very, very complex samples," Michalkski said, noting that the speed and high mass accuracy of the Q Exactive were keys to obtaining a level of coverage suitable for the work.
"It's only now [become] possible with the technology we have to actually identify many of these newly predicted, novel open reading frames, as proteins," she said. She added that the researchers had started the project using an earlier generation Orbitrap machine, and that moving to the Q Exactive "really gave a boost to the entire analysis."
Mass spec proteomics is emerging as a tool for researchers to confirm and better understand the protein outputs and functional impact of observed genomic variation and complexity (PM 11/30/2012). As Weissman noted, though, the technology still has its limitations. In the case of the Science study, the digestion-based discovery proteomics approach taken by the Max Planck team was poorly suited to identifying the very short proteins identified by the ribosome profiling.
Because the proteins in a conventional bottom-up proteomics experiment are digested into peptides prior to mass spec analysis, it can be difficult to distinguish between, for instance, full-length and truncated forms of a protein.
"The big finding with the ribosome profiling was that there were many really, really short open reading frames," Michalski said. "And with mass spectrometry – since we do the digest with trypsin, it of course becomes more and more difficult to get to these very short proteins."
Given this, the researchers confined their mass spec analyses to relatively long proteins, looking for proteins longer than 55 amino acids. In total, they investigated the products of 96 new genomic loci identified by the ribosome profiling and detected 53 previously unidentified proteins.
The findings, Weissman said, now provide a starting point for researchers to further investigate the functional roles of the newly identified proteins.
"In the same way that when we sequence a genome… when we use ribosome profiling [and mass spec] to understand what proteins are being encoded by that genome, it's a starting point for understanding function," he said.