NEW YORK (GenomeWeb) – Researchers at Genoscope, the national sequencing center in France, have developed a hybrid assembly method that creates nanopore synthetic reads by combining data from Oxford Nanopore's MinIon and the Illumina MiSeq.
In their proof-of-principle study, published last week in BMC Genomics, they showed that they could assemble a 3.6-mb bacterial genome into one contig with average accuracy of 99.99 percent.
Jean-Marc Aury, senior author of the study, told GenomeWeb that the method will be useful initially for smaller genomes, but as the throughput and accuracy of the MinIon increases, his team will use it for increasingly larger and more complex genomes.
Genoscope has been testing the MinIon since June, Aury said, and has so far performed about 30 runs on the MinIon. Using the hybrid method described in the study, his lab has obtained a nanopore synthetic error-free read as long as 91 kb.
In the study, the researchers describe a method that uses MinIon reads to form a template, which they use to recruit Illumina reads, forming a "seed read." The seed reads are then used to stitch together long, accurate, synthetic reads, Aury explained.
In addition, the team also published MinIon data from both an older and newer version of the chemistry, showing how the instrument has improved.
For instance, using the R7 chemistry, the average read length was around 2.3 kb, but with the newer R7.3 chemistry, the average read lengths were over 10 kb. In one run, with the R7.3 chemistry and a 20-kb fragment library, the team achieved an average read length of 14 kb.
The Genoscope team dubbed its hybrid assembly method NaS, for nanopore synthetic, and developed it because "existing assembly softwares were not implemented to deal with long reads with a high error rate," the authors wrote. Instead, NaS produces synthetic reads from both Oxford Nanopore and Illumina data that are long and accurate before doing assembly.
To demonstrate the method, the researchers sequenced the Acinetobacter baylyi genome with both the MinIon and the Illumina MiSeq.
Next, they used the nanopore reads to recruit Illumina reads that aligned to those reads, which they called seed reads. The researchers then used the seed reads to recruit other similar Illumina reads and performed a local assembly to create one single nanopore synthetic read.
This process, however, did not work for all regions of the genome, Aury said, leaving the team with some missing reads. "In those very low quality regions, there are no seed reads."
To address these areas, the group designed a second step, using the seed reads as a probe with the Illumina reads as a target and then creating a "dictionary of k-mers" to define the recruitment of the remaining reads. The group recruited an Illumina read to the seed read if it shared three non-overlapping 32-mers.
Next, they did a microassembly of the reads into NaS reads using Newbler software, which was originally designed to assemble Roche 454 sequence data. The researchers chose this assembler due to its ability to "deal with micro-assembly of synthetic reads," they wrote. In the majority of the cases, the team was able to construct one synthetic contig per MinIon read, Aury said, except in the areas of repetitive regions where there were multiple contigs. This was due to the assembly algorithm being unable to solve repeats. It "broke the contigs around repetitive regions," the authors reported.
For instance, the A. baylyi genome has seven rDNA clusters scattered throughout it, four of which are identical. Initially, there were broken local assemblies in that region. For those cases, the researchers realigned all the contigs to the MinIon reads and then used the alignment to "find a path that describes the organization of contigs across that region," Aury said, which they did by selecting the path with the highest seed-read coverage.
In total, they generated 11,275 NaS reads, corresponding to 23x genome coverage. Finally, they used the Celera assembler to assemble those NaS reads. The initial assembly contained three contigs with two regions missing NaS reads. To fill in those gaps, they used the MinIon reads.
The final 3.6-mb assembly consisted of one scaffold and covered 99.8 percent of the reference genome with an identity greater than 99.98 percent. It was generated using 57x coverage of MinIon reads and 50x coverage of Illumina reads, which created 23x NaS coverage. There was one misassembly, 4.67 mismatches per 100 kb, and 3.20 indels per 100 kb.
They then compared the assembly to an Illumina-only assembly as well as to hybrid assemblies with less MinIon coverage to see whether their hybrid approach offered any advantages,.
The Illumina-only assembly, which used a subset of the 250 bp paired-end reads corresponding to 50x coverage of the genome, consisted of 20 contigs with an N50 of 326 kb and covered 99.7 percent of the reference genome. However, the authors noted, no contigs were able to span the repetitive rDNA clusters.
Even when the researchers created a hybrid assembly with lower MinIon coverage — 14.4x and 28.6x genome coverage — those methods still resulted in a more contiguous assembly with 19 and five scaffolds, respectively, than the Illumina-only assembly.
The hybrid method can be completed in around three to four days, the authors reported, including six hours and three hours for library preparation on the MiSeq and MinIon, respectively; two days for sequencing; and approximately one day for the computational steps of the NaS workflow and genome assembly.
Aury said that currently, the main limitation of the approach is the throughput of the MinIon, which he expects will improve over time.
He said that his lab will continue to use this hybrid method for sequencing and assembly, initially of microbes and other smaller genomes, but would eventually like to move into plant genome assembly and to perform assembly with only MinIon reads.
"The quality improvement is very good in the latest chemistry, so we hope it will be possible to de novo assemble from MinIon reads only," he said. "The first application [of the MinIon] will be genome assembly and de novo sequencing," he said. But, "we deal with metagenomic and metatranscriptomic data and we'd like to test nanopore sequencing on this."