SAN DIEGO (GenomeWeb) – Researchers at the Cold Spring Harbor Laboratory researchers have sequenced and assembled the genome of the budding yeast Saccharomyces cerevisiae using an early access version of Oxford Nanopore Technologies' nanopore instrument, the MinIon.
The work is described in a pre-print article appearing online in bioRxiv earlier this month. Co-senior author Michael Schatz presented findings from the effort during a bioinformatics session at the annual Plant and Animal Genomes conference here yesterday.
To take advantage of the very long, but error-prone Oxford Nanopore reads, Schatz explained, the team tweaked error correction and assembly approaches previously developed for long single-molecule, real-time (SMRT) reads generated with Pacific Biosciences technology before integrating the consensus MinIon reads into hybrid assemblies.
"It follows that logical design: we take the [Illumina] MiSeq data and align it to the long reads to try to clean them up," Schatz told GenomeWeb following his PAG presentation. "But there were a lot of new technical enhancements that had to be done to make it work."
Both the PacBio and Oxford Nanopore technologies produce long reads that can span complicated or repeat rich regions of the genome, which are tricky to assemble using short read sequencing data.
The underlying technology is very different, though, Schatz explained, with the MinIon measuring changes in ion flow between one side of the membrane and the other as DNA molecules move through a protein nanopore.
Depending on a molecule's shape, size, electrostatic state, and so on, it produces different voltage or current patterns, which can be detected with a sensor below the membrane and translated into nucleotide sequences using an Oxford Nanopore base caller.
The MinIon's current sensor and array system produced 512 reads at once using a thumb-sized drive that's powered by a computer's USB port, he said, noting that 5-mers tend to be translocated through the pore together due to the width of the membrane.
To prepare samples for sequencing, researchers attach hairpin adaptors to one end of each DNA strand — starting with molecules that are roughly 10,000 base pairs long apiece — and a motor protein at the other end.
The motor not only helps in unwinding the two DNA strands, but also in attaching the molecule to the edge of a nanopore, Schatz explained. From there, a single strand of the DNA is translocated through the pore.
Because complementary DNA strands are linked to one another with the hairpin adaptor at the other end of the molecule, he added, it's possible to coax the second single strand through the pore, providing a second look at the same stretch of DNA.
Using an early access instrument obtained last spring, the team applied this approach to DNA from a yeast strain previously sequenced with the PacBio RS instruments.
The MinIon runs generally took between two to three days, Schatz noted, and produced between a handful and tens of thousands of reads per run.
All told, the researchers generated around 267,000 MinIon reads over 30 runs — enough sequence to cover the S. cerevisiae genome to an average depth of around 122-fold. The flow cell cost per run was around $1,000, Schatz noted.
When it was time to analyze this data, the group turned to some of the protocols developed for PacBio sequencing, though the methods had to be modified to deal with the different error models present in the Oxford Nanopore reads.
For instance, Schatz noted that the early access MinIon error rate exceeds the 10 to 15 percent previously reported for PacBio SMRT sequencing, coming in at roughly 35 percent.
The accuracy can be improved somewhat using "2D base calling," which takes the second DNA strands' readouts into account, the researchers found.
They also determined that reads with relatively modest lengths of around 40,000 bases tended to align best to the existing yeast genome, while reads in the 40,000- to 150,000 base range were noisy and proved problematic in alignments.
To improve the accuracy of the MinIon reads, the researchers developed an error correction pipeline called NanoCorr.
The approach resembles methods they've used to correct PacBio reads in the past in that it uses Illumina short reads to help correct error-prone long reads, Schatz said, though NanoCorr has implementation differences tailored to MinIon error profiles.
After BLAST searching Illumina MiSeq reads against the raw Oxford Nanopore reads, a dynamic programming algorithm picks the short reads that most closely match the long reads, ultimately calculating consensus sequences from the available Oxford Nanopore reads.
In yeast and microbial species tested, these consensus sequences had far lower error rates than raw reads, Schatz and his colleagues found, coming in at around 3 percent.
For the yeast strain that it sequenced, the team then plugged these error-corrected Oxford Nanopore reads and Illumina short reads into a modified version of the Celera assembler, producing a hybrid assembly that's more contiguous and complete than could be achieved using Illumina reads alone.
"We've already paid those dues in the sense that we've invested a lot of time to retrofit the assembler to support very long reads," Schatz said, noting that the assembler could support reads that are up to 500,000 base pairs long, should such read lengths become feasible.
Whereas the assembly that the CSHL group put together using Illumina sequence data alone had more than 99.9 percent identity with the yeast reference and a contig N50 of around 60,000 base pairs, for example, the assembly that contained consensus MinIon reads had a contig N50 that exceeded 470,000 base pairs and a consensus identity of 99.78 percent.
In particular, Schatz noted that the long MinIon reads made it possible to annotate new features in the genome, including transposons and other very long elements missed in the Illumina-only assembly.
"Longer [genome] features are missing in the MiSeq assembly. We recover the vast majority of them with the nanopore [reads]," Schatz said. "If you're interested at all in telomeres or gene cassettes or some of the synteny block-type elements, you need the long read technology."
Researchers have successfully applied MinIon sequencing to bacterial genomes as well, he said, explaining that a similar approach to that used in yeast can produce a near-perfect Escherichia coli genome.
While the current iteration of the Oxford Nanopore instrument cannot be plugged directly into a smart phone, Schatz and a high school group he mentors have already written an app called iGenomics for rapidly getting genotype profiles and other data from reads generated on a thumb-drive size sequencer.
At the moment, the MinIon is the only sequencer that fits the bill size wise, though Schatz noted that the app is compatible with sequencing data produced using other platforms as well. The idea is to establish a small, easily transportable system for identifying a potential pathogen — and potential treatments for them — in the field, for example.