BALTIMORE – Researchers at the University of California, Santa Cruz have developed a method that can convert Illumina libraries with short DNA fragments into long DNA molecules optimal for nanopore sequencing with the Oxford Nanopore Technologies MinIon platform.
The workflow, named "Illumina But With Nanopore" (IBWN), employs a strategy to circularize short Illumina library molecules and copy them by rolling circle amplification, resulting in long molecules with tandem repeats that enable MinIon sequencing for almost all Illumina libraries with comparable cost and accuracy to the Illumina MiSeq.
There were two main motivations behind developing the protocol, said Christopher Vollmers, a biomolecular engineering professor at UC Santa Cruz who presented the method, described in a preprint on BioRxiv last month, at the Advances in Genome Biology and Technology annual meeting in June.
For one, he said, like most small molecular biology labs, the lack of an Illumina sequencer on-site often means a turnaround time of several weeks to process sequencing samples with a genomic core facility. To reduce that wait time, the group sought to devise a way to sequence Illumina libraries using the Oxford Nanopore MinIon, which even a small lab like his can afford to purchase.
The other incentive for developing the method was to "massively lower" the entry barrier for graduate students to interact with high-throughput sequencing technology firsthand, Vollmers said. A lot of Ph.D. students in labs without Illumina sequencing platforms cannot get first-person experience with the technology, he said, adding that "we don't do anybody a favor if we give people molecular biology Ph.D.s, but they have never handled the Fastq files."
Mechanistically, Vollmers said, IBWN is built upon the Rolling Circle Amplification to Concatemeric Consensus (R2C2) method that was previously published and optimized by his lab for full-length cDNA sequencing.
"The idea is straightforward," he said, adding that the method leverages the fact that the vast majority of Illumina libraries have known adapters on their ends.
By targeting these double-stranded adapters, R2C2 can circularize short library molecules using Gibson assembly and copy them using rolling circle amplification with Phi29 polymerase, creating long, linear, double-stranded DNA pieces that contain multiple copies of the original Illumina library molecule's sequence. These long molecules can then be sequenced on the MinIon platform.
"Whatever you put on your Illumina sequencer, that's where we come in and convert," Vollmers pointed out. "You don't have to change the Illumina prep at all, [IBWN] just builds on top of it."
To analyze the data, the team also developed software called Concatemeric Consensus Caller with Partial Order alignments (C3POa), which can process the R2C2 nanopore data to generate consensus reads while demultiplexing the Illumina library indexes.
To test and benchmark the method, the team applied IBWN to RNA-seq libraries of the human A549 cancer cell line, Illumina ChIP-seq libraries of soybean samples, Illumina Tn5-based genomic DNA libraries of a Wolbachia-containing Drosophila melanogaster cell line, and Illumina Tn5-based genomic DNA libraries from lung cancer cell lines enriched for certain cancer-relevant genes.
Overall, the researchers found that the R2C2 method led to MinIon sequencing data with accuracy comparable to an Illumina MiSeq 2x300 bp run, independent of the read position. In particular, they reported that R2C2 RNA-seq data were "almost entirely interchangeable" with data produced by the Illumina MiSeq. Meanwhile, R2C2 library metrics from ChIP-seq and target-enriched Tn5 libraries were "very similar" to those generated by Illumina sequencers.
Beyond that, the group also sought to achieve real-time analysis of R2C2 libraries using nanopore sequencing. For that, they developed a computational pipeline called Processing Live Nanopore Experiments (PLNK), which can carry out basecalling, process raw sequencing data into R2C2 consensus reads, demultiplex the libraries, and align the demultiplexed R2C2 reads to a genome in real time.
In terms of cost, Vollmers said, without taking instrument cost into account, R2C2 sequencing with MinIon, which can generate up to almost 9 million reads from a single flow cell, is "about equivalent, maybe a bit more expensive" than traditional Illumina sequencing using MiSeq. However, he pointed out that for a small lab, the number one expense is often labor cost, and the fast turnaround afforded by R2C2 sequencing makes the method still cost effective. "If I have a grad student sitting around for a month waiting for data and having to work on something that's not the main project, that is a big loss for me," he said.
When it comes to applications, Vollmers said the R2C2 approach would help molecular biology labs carry out small experiments, such as RNA-seq or amplicon sequencing, that would normally require a MiSeq and help achieve real-time QC of the libraries before starting a sequencing run. In addition, with its protocol and analysis tools being open source, Vollmers said "there is no secret sauce" for the R2C2 method.
"My first impression is that [this method] is really cool," said Danny Miller, a physician scientist at the University of Washington who is experienced with nanopore sequencing. In particular, Miller said, the method's ability to evaluate the sequencing libraries in real time is "really nice."
As with most nanopore sequencing-based methods, Miller said the error rates for single nucleotide variants (SNVs), indels, and homopolymers are three important metrics to consider when gauging the sensitivity of this method. However, SNVs should not be a big concern in this particular case, he added, given that the DNA molecules are sequenced multiple times because they are amplified.
Although with the occasional indel, Vollmers said the R2C2 reads currently are "as accurate as" Illumina's. Still, he said nanopore sequencing's systematic error in homopolymers, which can lead to consensus read accuracy "worse than" Illumina's, is still a limitation for the R2C2 workflow, and the method should be carefully considered if being used for any applications that require exceptionally high consensus accuracy.
Moving forward, in addition to the libraries benchmarked in the preprint, Vollmers said his team is further evaluating the R2C2 method with highly multiplexed libraries, such as those used in single-cell and spatial genomics experiments.
Additionally, he said the lab has ordered an Oxford Nanopore PromethIon 2 Solo device, which is currently still under early access, and the plan is to test the method on the platform once it arrives. "That could really change stuff, because across two [PromethIon] flow cells, you could have 100 million reads," Vollmers said. "Then you are getting close to a mid-range [Illumina] NextSeq output."
Moreover, he said the team will continue to streamline the protocol for R2C2 and try to eliminate the overnight incubation step that is currently required during rolling circle amplification.
"We are optimizing every step," Vollmers said. "The goal is to come up with a protocol that goes from starting with an Illumina library in the morning to sequencing in the afternoon."