SAN FRANCISCO (GenomeWeb) – Direct RNA sequencing and pushing read lengths to over one megabase emerged as two hot areas for nanopore sequencing at this week's Advances in Genome Biology and Technology meeting in Orlando, Florida. In addition, one researcher reported on his lab's use of the PromethIon, Oxford Nanopore's higher throughput nanopore instrument.
Several researchers presented on their experience using Oxford Nanopore Technology's MinIon device to directly sequence RNA — an application the firm first described in 2016 and published last month in Nature Methods.
In addition, Josh Quick, a doctoral student in Nick Loman's laboratory at the University of Birmingham, discussed a protocol his group developed to generate read lengths greater than 1 megabase, which has sparked competition among labs to generate the longest read lengths and the longest N50 read length.
To sequence RNA directly on the MinIon, RNA is first ligated to a double-stranded adapter that targets the poly A tail of the RNA molecule. Next, there is a reverse transcription step, which creates a DNA-RNA hybrid molecule, followed by an optional cDNA elongation step and then the sequencing adapter with the motor protein is ligated and the RNA strand is pulled through the pore.
Rachel Workman, a researcher in Winston Timp's laboratory at Johns Hopkins University, described efforts by the Nanopore RNA Consortium, a group of six laboratories working on RNA sequencing on the MinIon. In total, the consortium has generated 13 million direct RNA sequences from 30 MinIon flow cells and more than 24 million cDNA sequences from 12 flow cells.
The group compared direct RNA sequencing of a cell line to the GENECODE reference, finding that 83 percent of the reads aligned and read identity was 87 percent. Workman said that the group also compared direct RNA sequencing to cDNA sequencing on the MinIon and to an Illumina cDNA dataset finding a correlation of 0.875 and 0.776, respectively.
The group also evaluated spiked-in synthetic RNA molecules, and Workman noted that while the method worked well for quantifying genes, with the level of observed genes correlating well with what was expected, when looking at the isoform level, quantification was a bit more complicated due to errors in the reads, multiple isoforms mapping to the same locations, and degradation of the 5' ends of the molecules.
Nonetheless, Workman noted that direct RNA sequencing offers a number of benefits including the ability to see multiple isoforms, detect splice variants, estimate the length of the poly A tail, and to detect RNA modifications without having to do chemical labeling or antibody pull-down.
Workman said that the method could "supplement existing methods," and also help "create a more robust transcriptome-wide map."
It was a "promising first glance at the data," she added.
Matthew Keller, a postdoctoral fellow on the Influenza Genomic Team at the US Centers for Disease Control and Prevention has also been testing direct RNA sequencing on the MinIon with the goal of using the technique to sequence influenza virus. As a first step, he said, his group sequenced Enolase, which comes in Oxford Nanopore's Direct RNA kit. The results were good, with full coverage, reads the expected length of 1,315 bases, and 99.7 percent consensus identity.
To sequence the influenza virus, however, several modifications had to be made to the standard protocol. Since influenza does not have a poly A tail, the initial ligation step had to be changed. Instead, the group made use of two conserved regions at the 3' and 5' ends of the virus to develop a modified adapter that would complement a conserved region.
As a first test, the group sequenced an H1N1 viral strain from Puerto Rico, which is commonly used for research because it's available in large quantities. The virus was first propagated in chicken eggs and then extracted through the allantoic fluid. They tested two methods for then extracting RNA from the allantoic fluid. One method is more time intensive but produces pure virus, while another is much quicker but the virus is not as pure.
For both the pure and crude samples, sequencing yielded full coverage of all eight segments of the virus. When comparing to a sample sequenced on the Illumina MiSeq, each method was 99 percent identical. Both samples also yielded the expected read lengths for each of the eight segments, but the crude sample was just a bit noisier than the pure sample.
The reads, however, were significantly less accurate than the MiSeq reads at around 85 percent, and even significantly less accurate than the direct RNA sequencing of Enolase, which yielded 90 percent accuracy.
"That tells us there's room for improvement" in the bioinformatics, Keller said.
In addition to increasing the accuracy, he said it would be critical to reduce the starting material. In this experiment, he said his group used 500 nanograms of input virus, a volume that is not feasible in a clinical setting.
Overall, "long-read direct RNA sequencing is a unique and transformative technology," he said.
Aside from direct RNA sequencing, a long anticipated application of nanopore sequencing has been the ability to generate very long reads. The University of Birmingham's Quick has been especially focused on methods at the front end to enable very long reads. The key, he said at AGBT this week, is in preparing the DNA. The "simplest methods are fast, but they generate DNA under 100 kilobases," he said. Instead, slow pipetting is required because high molecular weight DNA is fragile and very prone to shearing.
To generate long DNA molecules, Quick first uses what's known as the Sambrook method to extract DNA. It's a protocol that is used in molecular cloning and involves ethanol precipitation, which allows for the DNA to be resuspended at the desired concentration. To prepare libraries, Quick noted that standard ligation methods result in shorter DNA molecules. Recently, Oxford Nanopore developed a so-called rapid protocol, which involves the use of transposases, has fewer steps, and no clean up at the end, which results in longer DNA fragments, he said.
To increase the DNA fragments further though, he modified the protocol. Although using a transposase to generate long reads "seems counterintuitive," since transposases cut DNA, the key was in the ratio of transposase to DNA molecules. In the standard rapid protocol, multiple transposases bind to and cut one DNA molecule. But, in the revised protocol, by using just enough transposase so that each DNA molecule is only cut by one enzyme, "unnecessary fragmentation" is avoided.
In an initial test, he sequenced Escherichia coli, generating 5 gigabases of sequence data with a mean read of 33.3 kb, with the longest read stretching to 778 kb.
He said that the group has also begun using the protocol to sequence a human genome as part of the Nanopore Human Genome Consortium that consists of groups from seven laboratories in the UK, US, and Canada. The group initially presented data from their work at an Oxford Nanopore-sponsored workshop and in a publication on the BioRxiv server and published their results in Nature Biotechnology.
In that work, the researchers generated 5x coverage of the human genome using the ultra-long read protocol, obtaining an N50 read length of 99.8 kb. Adding the ultra-long reads to the rest of the data — approximately 30x coverage of the genome — increased the contig N50 to 6.1 megabases from 4.3 megabases. Since then, Quick said the group has continued to improve on the dataset and now has around 10x coverage using the ultra-long protocol. That has further improved the contig N50 to 10.2 megabases, he said.
Quick noted that there has been somewhat of a competition among researchers working on the MinIon to generate the longest read length. Currently, he said the record is 1.2 megabases. However, he noted that perhaps the more important metric is the N50, since it is indicative of the entire dataset, rather than just a small number of reads. The current record for that is at 127 kb, he said, but predicted that in the future long reads would routinely be generated for samples and for other sample types, such as metagenomes.
Already, some researchers are working on using nanopore sequencing for metagenomics. Ken McGrath at the Australian Genome Research Facility discussed his lab's early-access use of the PromethIon, a nanopore instrument with 48 flow cells that Oxford Nanopore recently made commercially available. McGrath said that after some initial technical problems with connecting it to the server and causing the lab's network to shut down, which he ultimately said was perhaps his fault for not reading the manual, the lab has gotten it up and running.
McGrath said he is currently using the instrument as part of the Metagenomics Research Group, and as part of work for that consortium he has used it to sequence a control sample of 10 organisms. That sample can be used he said as a spike-in control or to evaluate sequencing platforms.
In comparing the PromethIon with the Illumina HiSeq, the PromethIon generated 198,000 mapped reads to Illumina's 29 million. But, the types of reads were very different. While Illumina sequencing, due to the sheer volume of reads it generates, would be better for identifying rare organisms in a sample, the PromethIon, due to its longer reads, has higher resolution, McGrath said.
"Shorter reads can't always be assigned to a taxonomic unit," he said, and may not be able to identify organisms down to the species level, while the longer nanopore reads enabled species identification.
In addition, sequencing is fast on the PromethIon. He evaluated the data over time and found that "within one minute we had generated the general picture of what was in the sample," albeit with noisy data. After 10 minutes, there was "essentially no change."