NEW YORK (GenomeWeb) – Researchers at the Ontario Institute for Cancer Research have developed an open-source base caller for Oxford Nanopore's MinIon sequencer that can run without an Internet connection.
The team published a description of the base caller, Nanocall, in the journal Bioinformatics this week. Nanocall could have four main applications, according to the authors: for pipelines that need approximate mapping locations, in situations with limited Internet access, for quality assessment to ensure the correct sample is being sequenced, and as a platform to test new base callers and models.
Matt Loose, an associate professor at the University of Nottingham, who was not involved with the study, told GenomeWeb that an open-source base caller for MinIon data could be "immensely valuable for those wishing to interpret raw sequence."
In the study, the researchers developed Nanocall using data from MinIon's R7 pore and tested it on two Escherichia coli and two human samples, comparing it with data produced by the base caller from Oxford Nanopore's Metrichor.
Nanocall first separates the template and complementary strands of DNA. It does this by incorporating informatics that let it distinguish the hairpin sequence, which connects the two strands. Separating the template and complementary strands helped increase the accuracy, even though Nanocall does not merge the base calling from the two strands to generate so-called 2D reads.
After separating the two strands, there are several training and scaling options, including an option for single-strand scaling, in which the two complementary strands are processed independently of each other, and one for double-strand scaling, in which the parameters are the same across both strands. Nanocall then uses the Viterbi algorithm, which is based on the hidden Markov model, for base calling.
To test Nanocall, the team ran it on four datasets, from PCR-amplified and non-amplified E.coli and human DNA.
The researchers found that Nanocall reads were comparable with 1D data from Metrichor, with about 68 percent matching to the reference genomes. On the fastest setting, when users opt out of the training step, Nanocall can process 2,500 kilobases of sequence per hour. However, on the higher-performance settings, with training and double-strand scaling, speed dropped to around 500 kilobases per hour for the human DNA sample and up to 763 kilobases per hour for E. coli DNA.
In general, the authors noted that double-strand calling performed better than single-strand calling, with the difference between the two options more pronounced for the human datasets. On the human data, "single strand scaling leads to an additional 4 percent to 6 percent reads being mismapped," they wrote. Compared with the Metrichor 1D reads, Nanocall "increases mismapping rate by an additional 3 percent for E.coli data and 6 percent on human data," the authors wrote. Overall, however, "Nanocall produces reads comparable in mappability and quality to Metrichor 1D."
Since the OICR team published the results of Nanocall, Oxford Nanopore has upgraded its sequencing chemistry and software. The OICR team developed Nanocall for the older pore and chemistry, called R7. The newer version, R9, involves a different nanopore — E. coli CsgG, which Oxford Nanopore licensed from VIB in Belgium and University College London.
In addition, the newer software uses neural networks rather than a hidden Markov model, which improves base calling accuracy. And last month, the company made offline base calling software available through its MinKnow platform.
Loose said that in theory, the Nanocall software could be modified to work with the R9 pore, which would provide the advantage of having an open source base caller that could work with the newest version of the MinIon. In addition, the improved performance of the R9 pore may help boost the accuracy of Nanocall.
However, Loose noted that one drawback would be the speed of Nanocall. The R9 nanopore has a sequencing speed of around 250 bases per second, but the higher performance setting of Nanocall has a speed of just 185 bases per second.
Loose's team previously developed an analysis method dubbed "Read Until," which enables selective sequencing without using target capture, and he said that it could be interesting to integrate Nanocall with Read Until in order to map reads quickly offline. However, he noted that the two pipelines are not yet compatible, as Nanocall would need to be both faster and have the ability to work with just a fragment of a read.
Nevertheless, he said, Nanocall could be very useful to researchers looking to advance the applications of nanopore sequencing. "One key feature of nanopore sequencing is that it is truly real time — the data are available for analysis the moment a DNA molecule passes through the pore," he said. "The development of fast offline base callers will contribute to making this a reality in all scenarios, and especially in the absence of cloud compute."