NEW YORK (GenomeWeb Daily News) – The human cytomegalovirus may be more complex than previously thought, according to research published in Science this week.
Researchers involved in the study performed ribosome profiling to identify hundreds of new open reading frames in the virus, and thus possible new proteins encoded by it.
The human cytomegalovirus, which infects nearly every human and can cause disease in immunocompromised adults as well as birth defects in newborns, was first sequenced about 20 years ago. Its genome is about 240 kb large and was thought to have an estimated number of ORFs ranging from 165 to 252.
"The genome of a virus is just a starting point," said senior author Jonathan Weissman from the University of California, San Francisco, in a statement. "Understanding what proteins are encoded by that genome allows us to start thinking about what the virus does and how we can interfere with it."
To study ORFs in HCMV over time, Weissman and his colleagues infected human foreskin fibroblasts cells with the virus and took samples of the infected cells after five hours, 24 hours, and 72 hours.
In addition, the researchers treated the cells to gauge ribosomal positioning: cycloheximide, a translation elongation inhibitor, was applied to the cells to examine the overall distribution of ribosomes; harrington and lactimidomycin were applied to encourage ribosomes to accumulate at transcription start sites rather than over the length of the message; and others were not treated.
This approach allowed the researchers to determine how genes are arranged in HCMV. For example, in the UL25 ORF, the researchers found one transcriptional start site upstream of the ORF. In the harrington- and lactimidomycin-treated cells, the ribosomes marked a single initiation site at the first start codon downstream of the start site, and in cycloheximide-treated and untreated cells, the density of the ribosomes accumulated near the first in-frame stop codon.
Using such ribosomal footprints as a guide, the researchers identified hundreds of new ORFs: ORFs within known ORFs, out-of-frame ORFs, upstream ORFS, and ORFs starting at near-cognate start codons, with CUG rather than AUG, for example.
The researchers also annotated splice junctions and used data from harrington-treated cells, where ribosomes accumulate at start sites, to develop a support vector machine-based machine learning strategy to uncover even more ORFs. In doing so, they identified an additional 53 possible ORFs.
All in all, Weissman and his colleagues reported that they found 751 translated ORFS in HCMV. Of those, 147 were previously thought to be coding.
Many of the newly uncovered ORFs are very short — often less than 100 nucleotides in length with many even smaller than 20 nucleotides in length — and are usually found upstream of larger ORFs.
"A key finding of our work is that each of these templates can encode more than one protein," said the Max Planck Institute of Biochemistry's Annette Michalski, in a statement.
Using high-resolution tandem mass spectrometry, the researchers were able to confirm a number of proteins that originated from the newly identified ORFs.
The researchers also noted that viral genes, including the newly found ORFs, are tightly controlled over time, and that the use of 5' ends is "critical to the tight temporal regulation of viral genes expression and production of alternate protein products during infection," the study authors wrote.
"Our work yields a framework for studying HCMV by establishing the viral proteome and its temporal regulation, providing a context for mutational studies, and revealing the full range of HCMV functional and antigenic potential," they added.