NEW YORK (GenomeWeb) – An American team led by researchers at the US Army Medical Research Institute of Infectious Diseases (USAMRIID) has come up with a proposed set of standards for investigators sequencing and assembling viral genomes.
In an editorial article published in the journal mBio last week, the researchers described five viral genome categories, ranging from standard draft to finished genomes. In addition to outlining the rationale and requirements for each category, they also described potential applications of viral genomes belonging to these classifications.
The guidelines were established with the help of co-authors at Johns Hopkins University, University of Maryland, Broad Institute, J. Craig Venter Institute genome centers, as well as representatives from bodies such as the US Food and Drug Administration that oversee regulatory science related to viral countermeasures and vaccines.
"The biggest value for this manuscript is that it sets out standards that are accepted by the majority of the agencies that regulate the development of drugs, vaccines, and biologicals to control viral infections," senior author Gustavo Palacios, director of USAMRIID's Center for Genomic Sciences, told In Sequence.
In developing the guidelines, the team got input from other groups that routinely generate and/or use viral genomes as well. For example, Palacios noted that the Filovirus Animal Non-clinical Group — an inter-agency group tasked with coming up with countermeasures standards related to filoviruses such as Ebola and Marburg — has already adopted the viral genome standards detailed in the paper.
Given the success that group has had implementing the genome standards so far, Palacios and his colleagues are optimistic that other researchers and viral countermeasures development teams may implement the guidelines in their own viral sequencing, assembly, and analysis pipelines.
By publishing the editorial, Palacios and his co-authors hope to reach a broad base of researchers who are involved in viral genome sequencing projects and may wish to share feedback on the proposed classifications.
The general approaches for tackling viral genomes are much the same as those used to sequence and assemble larger bacterial or eukaryotic genomes, the researchers noted, though there are a few special considerations.
For instance, it can be particularly tricky to tease viral RNA or DNA apart from other genetic material found in host cells, first author Jason Ladner, a post-doctoral researcher at USAMRIID's Center for Genome Sciences, explained.
"Even for viruses that replicate to very high copy numbers … the majority of the DNA or RNA that's going to be present in a sample is going to be from the host cells," he told IS. "That's one of the large complications: you're always sequencing a lot of host material along with the viral material."
The secondary structures found in some viral genomes, particularly those composed of RNA, can be difficult to deal with, too. Because such secondary structures are prone to low coverage, Ladner said, they can lead to breaks in the resulting genome assembly.
Nevertheless, as with other sequencing applications, the advent of high-throughput sequencing technologies has led to a rise in the number of viral genomes, as the sequences are cranked out at an ever-faster pace.
"A lot of people are generating viral genomes and a lot of people are depositing viral genomes," Ladner said. "But right now there's no common vocabulary that people use to communicate to other labs and other researchers how finished a particular viral genome is."
Rather, most sequence repositories are relying largely on the expertise of researchers to place a viral sequence in the appropriate sub-database and determine its level of completion.
In an effort to standardize the descriptions of these genomes, Palacios, Ladner, and their co-authors started from a few basic assumptions about viral genomes, including information on viral genome structures, which tend to be highly conserved.
"Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques," they wrote.
The group ultimately proposed five categories for viral genome assemblies: a standard draft genome, high-quality draft genome, coding complete genome, complete genome, and finished genome.
In that classification context, the standard draft version of a viral genome would be fragmented, perhaps due to insufficient or variable coverage in the shotgun genome reads used to assemble it, the paper's authors noted.
Still, such sequences would be considered authentic draft genome assemblies rather than partial viral sequences if they contained one or more contigs per viral genome segment, they noted.
On the other hand, a viral genome is considered high quality once there are no remaining gaps in the contigs representing each viral segment, according to the team's classification scheme, though there may still be open reading frames that are not fully represented.
When an assembly does contain comprehensive coverage of every open reading frame in the viral genome, it falls in the coding complete category, which includes genome assemblies that are lacking only end sequences.
The group foresees a "complete" classification for viral genome assemblies that contains both complete open reading frames and end sequences. The "finished" genome classification is reserved for assemblies that factor in another layer of complexity: the variation found within viral sequences from a single sample.
"When you're dealing with a [viral] sample — be it a clinical sample or a stock in the lab — you're sequencing hundreds or thousands of viral particles and you're going to have some genetic and genomic diversity within that sample," Ladner said.
"That diversity can be very important for some downstream applications such as testing of vaccines and therapeutics," he explained. "So we thought that it was really important to have a category that would capture the need to really dive into the diversity that's present in a sample — and not just give the consensus genome sequence."
The researchers involved in establishing the new classification criteria expect the latter viral genome category to be especially relevant for researchers developing viral countermeasures and/or testing a given viral "swarm" in animal models.
For example, Palacios noted that he and his colleagues at USAMRIID are using the finished genome level for those types of viral projects. "By knowing exactly what is the composition of the final swarm … and following changes that occur after challenge and during the infection, you can see the selective pressures that your counter-measure is generating in that viral population," he said.
But there are various other applications that are less dependent on complete or finished genome assemblies, the team explained, such as diagnostic tests focused on a particular portion of the genome, which should be possible with standard draft genomes provided the region of interest is present.
Generally speaking, the researchers argued that the viral genomes in the high-quality category could prove useful for comparative genomic studies, for example, while coding complete and complete assemblies may find favor with those interested in doing immunological assays and reverse genetics, respectively.
"If you want to go in and develop reverse genetic systems, not only do you need to have the complete open reading frame, but you need to have the very ends of the viral segments," Ladner said.
"These are often the trickiest areas to get when you're doing high-throughput sequencing," he explained, "and those are the areas you're going to have the lowest coverage for."
Beyond reverse genetic applications, the researchers noted that complete viral genomes should also be well suited to microbial forensics research or designing tests for viruses with very specific sequence characteristics.
The overall classification structure described is meant to be agnostic to the sequencing technology and assembly approaches used, researchers noted, making the standards applicable to viral genomes generated already and those yet to come.
"The idea behind the standards is that they should be agnostic of the platform and they should be prepared for future platforms," Palacios said.
"We don't focus on coverage thresholds and various other things, because those are all going to be platform-dependent," added Ladner. "We tried to make sure that the criteria that we used to define different categories were completely agnostic of those categories."