This story has been updated to reflect Mehrdad Hajibabaei's current role at the Centre for Biodiversity Genomics (CBG) in Canada and to include comments from a PacBio spokesperson.
BALTIMORE – As nanopore sequencing continues gaining traction for biodiversity research and species monitoring, new data suggests it can potentially become a suitable alternative to Sanger and Pacific Biosciences sequencing, which are currently widely used for DNA barcode sequencing for species identification.
In a preprint study posted on BioRxiv last month, researchers from the Natural History Museum in London compared nanopore sequencing head-to-head with PacBio sequencing for DNA barcoding sequencing and species identification. The results showed that, using morphological identification and Sanger sequencing as reference points, both PacBio and nanopore sequencing delivered high-quality DNA barcode sequences, while the latter achieved slightly more accurate results using Oxford Nanopore Technologies’ newer flow cell and chemistry.
Raju Misra, head of the museum’s molecular biology division and the lead investigator of this study, said the paper stemmed from the team’s involvement with various biodiversity research initiatives including the Darwin Tree of Life project, which aims to sequence the genomes of every eukaryotic species in the UK and Ireland, and the International Barcode of Life (iBOL) project, a research alliance trying to build DNA barcode reference libraries to identify species on Earth.
“The logic behind [DNA barcodes] is that a gene being so well conserved and slow to change will be a good marker for identifying a species or linking a taxonomic name to that species,” said Misra.
The original gold standard for DNA barcode sequencing, Sanger sequencing can reliably generate high-quality reads for DNA sequences up to 700 bp, according to Misra. However, the method, which has been around for several decades, has a significantly lower throughput compared with today’s standard.
Meanwhile the short-read nature of faster next-generation sequencing (NGS) technologies, such as Illumina sequencing, makes it difficult to tackle longer DNA barcodes, hampering accuracy, Misra said, adding that PacBio sequencing, which offers high throughput and can generate highly accurate long-read sequences, has increasingly become the method of choice for DNA barcoding projects for many researchers.
However, the downside with PacBio sequencing, Misra said, is that the platform often requires a very large number of samples to make each run cost-effective, making the technology often “out of reach” for many smaller research labs and organizations. “[PacBio sequencing] is a great technology, but it's very high throughput,” he pointed out. “Getting hold of tens of thousands of things requires huge numbers of people to collect stuff for you to feed these machines.”
In contrast, nanopore sequencing which has a higher throughput than Sanger sequencing but tends to be more flexible than PacBio sequencing, is gaining more momentum for applications in DNA barcoding projects, Misra said.
For this study, Misra’s team directly compared nanopore sequencing with PacBio sequencing using manual curation and Sanger sequencing as points of reference. Within nanopore sequencing, the researchers also tested five combinations of sequencing reagents and flow cells from Oxford Nanopore, including the Flongle flow cell with the SQK-LSK110 sequencing kit, R9 flow cell with the SQK-LSK109 kit, R9 flow cell with SQK-LSK100 kit, and R10 flow cell with the newest Q20+ chemistry.
In general, the results showed both nanopore and PacBio sequencing are suitable for DNA barcoding to achieve species identification. In terms of accuracy, Misra said both PacBio and nanopore sequencing “fared comparatively,” with nanopore sequencing using the R10 flow cell and the newest Q20+ chemistry performing “slightly better” than PacBio sequencing. Meanwhile, the Flongle flow cell performed the weakest within the Oxford Nanopore family, though it was still usable for DNA barcoding, Misra added.
As for turnaround time, Misra said the workflow for nanopore sequencing is notably faster compared with that of PacBio sequencing, given its much shorter sample prep time and real-time data analysis.
Speaking of cost, Misra said using Sanger sequencing's per-sample cost as a cutoff, nanopore sequencing with the Flongle flow cell — the smallest flow cell type available from Oxford Nanopore — was the most cost-effective and achieved the same cost as Sanger sequencing when sequencing more than 60 DNA barcode samples. Whereas nanopore sequencing with the MinIon flow cell and PacBio sequencing became more cost-effective than Sanger sequencing after 183 and 356 samples, respectively.
"It's an interesting study," said Mehrdad Hajibabaei, a professor at the University of Guelph and the chief scientific officer of the Centre for Biodiversity Genomics (CBG) in Canada. "I was actually quite excited to see the comparative analysis they have done."
With the caveat that the results presented in this preprint, which have not been thoroughly peer-reviewed, are accurate and correctly analyzed, Hajibabaei said he welcomes more comparative studies such as this one, and noted that having more sequencing tools and technologies would be beneficial for the DNA barcoding research and application communities.
Hajibabaei said for the Canadian Centre for DNA Barcoding, which is a core facility for CBG and the headquarters of the iBOL project, PacBio sequencing has been "the workhorse" for DNA barcoding projects since it has proven to be highly accurate.
That said, for DNA barcoding, "cost, time, and all those other good things are important, but accuracy is the most important element," he said. "If the sequence is not accurate, then identification can be influenced."
Hajibabaei said the DNA barcoding community did not seriously consider some earlier versions of nanopore sequencing because their accuracy was lower. However, as its accuracy improves, he thinks it is gaining more attention from DNA barcoding researchers.
Compared with PacBio sequencing, Hajibabaei said one of the advantages of nanopore sequencing is that its instruments typically require less upfront investment, making the technology more accessible for smaller labs.
In addition, nanopore sequencing "built a good reputation around portability," he said. "That could definitely provide a very important advantage, especially for areas of the world that may not have the same infrastructure as the Western and more modern settings."
Still, Hajibabaei said this study could benefit from a larger sample size as well as a more comprehensive sequence variant analysis between nanopore and PacBio sequencing by directly evaluating sequencing depth, accuracy, and specificity.
Moreover, he said, given this study primarily focused on the cytochrome c oxidase 1 (CO1) gene, a well-conserved gene in the eukaryotic mitochondrial genome that is typically used for DNA barcoding in animals, it is also important for researchers to look at other barcodes for plants, fungi, and protists when comparing the two technologies.
"The developments described in the preprint compare [Oxford Nanopore's] latest R10 flow cell + Q20 chemistry with data from our legacy (circa 2018) Sequel platform and chemistry, analyzed with non-supported software. Meaning, the comparisons are many years out-of-date, predating the popularization of HiFi sequencing on the Sequel II system," a PacBio spokesperson wrote in an email. "This is important because we’ve made important gains in our Sequel II and Sequel IIe platforms — higher accuracy and 10-fold throughput gains, reduced workflow times, and simplification of our library prep with the new Smrtbell prep kit 3.0."
Although the authors claimed that they were able to successfully sequence a higher portion of the 262 specimens using the Oxford Nanopore R10 flow cell paired with the Q20+ chemistry than on PacBio's Smrt Cell 1M, the spokesperson added that the researchers used a single Smrt Cell for PacBio sequencing, and it is unclear how many flow cells were used for the Oxford Nanopore experiments.
"Further, we don’t know if they used circular consensus sequencing (CCS) reads. While the authors of the paper mentioned they turned on CCS mode, that doesn’t guarantee the data was processed downstream by PacBio CCS algorithm," she added.
Encouraged by the results, Misra said his team at the museum plans to expand the usage of nanopore sequencing for DNA barcoding. Still, he said that compared with PacBio sequencing, one current bottleneck of nanopore sequencing is that it still requires a higher amount of DNA input material, making it challenging to barcode some smaller specimens, such as certain insects.
Besides DNA barcoding, Misra said the museum has already applied nanopore sequencing for environmental DNA (eDNA) metagenomics research to take advantage of the technology's easy-to-use feature and infield capability.
"[What] we were finding increasingly is it's getting harder and harder to take samples out of countries," said Misra said. "So, the ability to start doing these types of biodiversity surveys [in the field] is where things really open up."