Skip to main content
Premium Trial:

Request an Annual Quote

Oxford Researchers Adapt RNA-seq for Whole Viral Genome Sequencing from Biological Samples


A team from the University of Oxford and the Wellcome Trust Centre for Human Genetics has adapted RNA shotgun sequencing to the task of recovering and assembling the whole genomes of viruses in biological samples.

The group published a study in PLoS One last month describing the method's effectiveness compared to Sanger sequencing in recovering norovirus and hepatitis C virus genomes from blood and fecal samples. The authors reported that the method can recover close-to complete viral genomes, as well as the sequences of multiple within-host variants of highly diverse viral pathogens.

According to the authors, efforts to categorize the genomes of circulating RNA viral pathogens like norovirus and hepatitis C have been hampered by the great genomic variation within virus populations and resulting difficulties in primer design, as well as problems culturing many of these organisms to obtain purified viral nucleic acids suitable for whole-genome sequencing.

Meanwhile, approaches to try to purify viral RNA from biological samples sufficiently enough for the recovery of whole, or close-to-whole genomes is complicated by the presence of contaminating RNA from other sources, the group wrote.

Derrick Crook, one of the study's senior authors, told In Sequence that most approaches to deal with these challenges have previously relied on target-specific, primer-based amplification. But such techniques are relatively expensive, labor-intensive, slow and inflexible, compared to the RNA-seq method Crook and his colleagues have developed, he said.

"It's been very difficult using short read technology to recover in a reliable way the complete genomes of circulating viruses," Crook said, both in terms of whether the whole viral genome is recovered and whether the sequencing is accurate in its base calling.

"In both those dimensions we have developed a technique that quite clearly achieves [these goals]," he said. "And we've shown that you can scale it up and do it on a fairly large number of samples reasonably rapidly."

The team's method combines RNA shotgun sequencing using the Illumina HiSeq or MiSeq, with bioinformatics analyses to recognize, extract, and assemble viral sequences from the heterogeneous group of all RNA sequences in a complex biological sample.

In the study, Crook and his colleagues started with an initial proof of principle using the Illumina MiSeq, applying the method to three norovirus samples from feces and two HCV samples from blood. According to the researchers, the method produced more than 97 percent complete genomes from these samples compared to reference genomes, with minimal or no difference in sequence compared to Sanger sequencing.

For one norovirus sample, both the RNA-seq method and Sanger sequencing resulted in 99.1 percent of the genome, with base calls from both methods identical to each other. A second sample also showed identical calling between the two methods. And in a third — chosen specifically to test whether the RNA-seq method might work in samples where Sanger sequencing could not — Sanger sequencing was indeed not possible due to a failure of PCR to produce any products.

Only one HCV sample had a near-full length Sanger sequence available, and due to missing bases at the end of the genome, only 84 percent of the genome could be called by both methods, according to the authors. This sequence differed at two positions. In the second HCV sample only 15 percent of the genome was available as a partial Sanger sequence. Among this 15 percent, there were five single nucleotide variant differences and a one-base pair insertion in the Sanger results compared to the RNA-seq results.

To show that the method could be scaled to a high-throughput workflow, the team also sequenced an additional 61 norovirus samples in a single Illumina HiSeq 2000 batch, recovering more than 90 percent of the reference genome in all but one sample, according to the study authors.

The researchers also noted that the method requires only 12 PCR cycles for amplification, compared to over 30 cycles for amplicon- or hybridization-based methods.

Based on the team's experiments so far, Crook said the researchers believe that the workflow for the method should take about a week, currently, and could get down to a few days with more refinement.

According to the team, the mapping-based approach they have taken so far for recovery and assembly should be taken only as a prototype. Crook said his group is working on improving the bioinformatics side of the method, with plans to publish another paper focused on that work in the near future.

"The key step is to improve the assembly, or the recovery and assembly of viral sequences, because it is a minority of the sequenced reads that are viral in nature," Crook said.

"Since we published the paper, we've made huge strides in recovery and assembly of those reads and we are getting very good at it," he added. "But, we are just the first to publish this approach, and we are aware of other groups doing similar things, hopefully converging on what is a fairly straightforward conceptual approach.

"And we're happy others are replicating this work because it probably means it's going to be deployable," he added.

Crook said he and his team developed the method for Illumina technology because that is what they are limited to in their research environment.

"I can't predict how it would work with other [sequencing platforms] with any certainty, but I would imagine with any platforms suitable for RNA-seq you would be able to modify the technique," he said.

Additionally, Crook said that as methods for strand or nanopore sequencing advance further they might become an even more optimal tool for this type of work.

"As the concept of strand sequencing goes beyond the aspirations of startup companies and becomes reality, it's likely that that approach would be even better," he said.

According to the study, nanopore sequencing could have advantages over current sequencing technology such as longer read lengths, quicker turnaround times, and simple sample preparation requiring less input material.

"Complete genome sequences may ultimately be obtained directly from clinical samples using these enhanced sequencing platforms and improved bioinformatics analysis in clinically relevant time-frames," the authors wrote, which would "revolutionize the diagnosis of viral infections and would also promote new avenues of research into virus evolution, antiviral resistance and personalized medicine approaches to treating specific viral genotypes."