NEW YORK (GenomeWeb) – Researchers at the Broad Institute have developed an RNA-seq strategy amenable for constructing de novo assemblies of Lassa and Ebola viral genomes from biological samples.
The method was published recently in Genome Biology and uses RNase H-based digestion to remove contaminating poly(rA) carrier RNA and rRNA. In addition, the group has developed a hybrid enrichment protocol to enhance the viral content of sequencing libraries.
The Broad may also deploy this method as part of its collaboration with Illumina and the US Agency for International Development to help with Ebola genome surveillance in several West African countries.
Lead author Christian Matranga told GenomeWeb that the strategy evolved after the group tried to sequence the transcriptomes of Lassa viruses directly from clinical samples. But, sequencing resulted in very few reads that aligned to the Lassa virus. The majority of the reads were either human rRNA or poly(rA) carrier RNA, which arises as part of a step done to inactivate the virus in clinical samples.
In order to remove the contaminating poly(rA) carrier RNA and rRNA, the team developed a targeted RNase H-based depletion method. Matranga said that other groups have previously used RNase H-based depletion methods to get rid of rRNA in transcriptome sequencing studies, including researchers from Genomic Health, who originally described the method in PLOS One in 2012 on formalin-fixed paraffin-embedded samples.
Matranga said the general concept is the same, but with some minor tweaks that make it specific for working with viral genomes. For instance, he said, the issue of carrier RNA byproduct is a problem specific to working with viral samples.
Standard operating procedures for collecting clinical viral samples often involve inactivating the virus in a buffer that contains "long stretches of poly ribosomal A RNA," he said.
And when making RNA-seq libraries, the poly(rA) becomes a substrate for the cDNA synthesis. "It was actually being incorporated into the cDNA libraries," he said. That is an issue, because those long homopolymeric stretches cause low-quality sequencing reads.
The carrier RNA is often a necessary component of the inactivation buffer because it helps prevent RNA loss. "You always want to add some sort of carrier," Matranga said, but at the same time, when doing RNA-seq, "it comes up in our libraries."
To remove that carrier RNA, Matranga said that the group tweaked the RNase H-based depletion step to take advantage of a feature of RNase H, so that it will only degrade RNA that is hybridized to DNA.
Before treating with RNase H, the group designed DNA probes that are complementary to rRNA, creating ones to target each rRNA species, Matranga said. Then, the group treated the hybridization with RNase H, which removed the rRNA. In order to remove the poly(rA), the group does a similar hybridization reaction with oligo dT.
After depleting rRNA and the poly(rA), standard RNA-seq can be performed, Matranga said. And in fact, the group found that the RNase H depletion step resulted in at least a five-fold increase in viral reads compared to simply sequencing the clinical samples directly, Matranga said, and in the best case, resulted in 70-fold enrichment.
Aside from enabling viral sequencing from clinical samples, Matranga said the technique should enable studying of metagenomics. Often, patients with Lassa virus have co-infections with either another virus or bacteria, he said. The unbiased RNA-seq approach will include reads not only from the host and viral genomes, but also any other viral or bacterial genomes that are present, he said.
However, when the Broad group transfers their protocols to collaborators' labs in West Africa that want to use the methods to study clinical samples from the Ebola outbreak, Matranga said that they will likely incorporate a hybrid selection step to enrich for Ebola viral content even more. Those labs will likely be doing the sequencing on Illumina's MiSeq systems, rather than the higher-throughput HiSeq instrument, Matranga said.
The drawback of hybrid selection is that "you lose all the metagenomic content and the host content that you would get from standard RNA-seq," Matranga said, but, "you gain many more viral reads."
Incorporating hybrid selection will enable de novo assembly of the viral genome and also variant calling with many fewer reads, he added. "We'll be able to get a picture of every last molecule of Lassa or Ebola with just one MiSeq run of many samples."
In the study, despite depleting rRNA and carrier RNA from the Lassa virus samples, in many cases fewer than 1 percent of reads were viral, making assembly and variant calling cost prohibitive.
To do hybrid selection, the group designed 42,000 100-mer oligonucleotides based on a diverse set of consensus Lassa genomes and tested the hybrid selection on 13 Lassa virus libraries that had previously been sequenced. On average, hybrid selection enriched viral content in the sequencing data by 86x.
The researchers were initially designing the method for Lassa virus samples, but as they were finishing up, were asked to test the method for Ebola samples as well. Looking at four clinical samples, they found that the approach was able to lower rRNA contamination from greater than 80 percent to less than 0.5 percent, resulting in an enrichment of Ebola virus content of 13- to 24-fold.
In the future, Matranga said that the researchers are also planning to use the Illumina Nextera XT kit for library construction to speed up the process.
The researchers wrote in the study that they have now used this approach on 99 clinical samples from 78 patients in Sierra Leone, sequencing the viral genomes to approximately 2,000-fold coverage. Incorporating the Nextera library construction method decreased the "overall process time three-fold," the authors wrote. "We were thus rapidly able to make our data available to the community, to enable timely insights for surveillance and control efforts, and to inform diagnostic and therapeutic developments during the epidemic," they added.