NEW YORK – Two new algorithms have unlocked the potential for targeted sequencing of human and other large genomes using nanopore technology without having to enrich the targets of interest. Immediately applicable in cancer gene and metagenomic sequencing studies, they herald a new approach to nanopore sequencing, where the DNA of interest could change on the fly.
In separate studies posted to BioRxiv last week, teams from Johns Hopkins University and the UK's University of Nottingham described how they exploited the "Read Until" feature of Oxford Nanopore Technology, first published by the Nottingham group in 2016, to essentially perform targeted sequencing panels.
"It's a computational filtering approach very different from having to capture things," explained John Tyson, a researcher at the University of British Columbia who is familiar with nanopore sequencing but was not involved in the studies.
The Johns Hopkins team, led by Michael Schatz, sequenced a panel of 148 human genes at 30X coverage on a MinIon flow cell. Before implementing their algorithm, dubbed UNCALLED (Utility for Nanopore Current ALignment to Large Expanses of DNA), which reads raw signal to determine whether the pore should spit out the DNA molecule, the same flow cell would have yielded only 5X coverage, he said. "Even with 50X coverage, Illumina sequencing only gets half of the structural variations found" with nanopore sequencing, which also includes DNA methylation information, Schatz said. The team also used Read Until to deplete reads belonging to a particular genome in a bacterial sample.
The Nottingham team, led by Matthew Loose, was able to sequence an even larger panel of 10,000 human genes. Their algorithm used the on-board GPU of a GridIon Mk1 instrument to perform basecalling to select reads, although they suggested commercial GPUs might also work. They were able to do the entire panel on a MinIon flow cell.
The Read Until feature used by both algorithms was revealed by Oxford Nanopore in 2015, but the code needed to run it was only availably by request. The standard process works by drawing DNA through a pore using voltage applied across the membrane. That voltage can be selectively reversed on individual pores, leading to ejection of whatever's in it. It happens fast and the spent read is unlikely to get drawn back in, due to both statistics and the fact that it has been undone with a helicase enzyme.
"Before a couple weeks ago, there were no reports of Read Until being effective on anything besides [viral DNA]," said Sam Kovaka, a graduate student in the Schatz lab and the first author on the UNCALLED study.
In his 2016 paper in Nature Methods, Loose demonstrated the ability to target amplicons from the bacteriophage lambda. He originally used raw electrical signal, but moving to basecalling allowed him to incorporate a host of other tools to analyze the data in real time. Raw signal, he said, is "far more computationally intensive," and the reference genome must be converted into signal-like data to compare it to the experimental data.
Loose praised Schatz's team for what it was able to do with UNCALLED. "It's a really neat optimization they have," he said. Schatz noted that the algorithm performs a modified form of basecalling and that more robust technology and improved software on the Oxford Nanopore platform helped both teams get to where they are today.
Either way, the algorithms are efficient enough to enable targeted sequencing on the nanopore platform. Both teams showed enrichment of desired targets and Schatz's group showed depletion of unwanted DNA, but both algorithms should be able to reject a sample of undesired sequence. "They're two sides of the same coin," Loose said. "Once you have an algorithm that is ejecting reads, it's up to you how you want to use that capability to either enrich or deplete."
While the majority of the flow cell's capacity is spent on reading DNA that will eventually be ejected, the method helps researchers find what they're looking for. This can save money compared to non-targeted nanopore sequencing. Both research Schatz and Loose said the method brings the cost of sequencing a panel closer to what it would cost with targeted Illumina sequencing.
"If you commit to doing 300 flow cells, the cost drops to $475 each. For about $500, you can get very comprehensive [tumor] profiling," Schatz said.
Traditional target capture methods have a high upfront cost to design probes and run the assays, Loose said, "whereas with this, you're sampling all the time and picking out what you want. It's a computational filtering approach very different from having to make things or capture things."
"Even in flow cell costs alone it saves money," Loose said. A “really good,” meaning better than average, flow cell could sequence the team's panel with 30X coverage, allowing it to identify structural variants in 15 hours. An average flow cell using adaptive sequencing would yield 15X to 20X coverage of the panel in 20 hours on a cancer cell line, he noted. But without targeted sequencing, getting 30X coverage would take anywhere from four to eight flow cells.
Both studies used DNA from cell lines, but the teams are keen to take the next step and unleash their methods on patient samples. In a related collaboration with Cold Spring Harbor Laboratory and Northwell Health, Schatz's team is finding structural variants in breast cancer patients that he said can only be detected using long reads, including mutations within major cancer risk genes like BRCA1 and CHEK2. "As far as we can tell, these mutations are very rare in the healthy population, but we are eager to use UNCALLED to profile them in additional patients so that we can design a new long-read cancer panel to assess the novel associations we find," he said. In other projects, his team is profiling variants in the human major histocompatibility complex region as well as in other Mendelian disease risk genes. He is also planning to use the method for work in plant genomes, such as tomato.
Schatz said he made UNCALLED available open source and has no plans to patent it. Also, on Monday, Oxford Nanopore announced it would make the Read Until API available to researchers and is working on a so-called "adaptive sampling" API, which it plans to make available before its user meeting in London in May.
Tyson noted that de novo assembly of genomes was another potentially important application of these methods. "We're doing assemblies of larger genomes, so we're interested of targeting regions where you have gaps remaining with longer reads," he said. His lab at the University of British Columbia is looking to find a way to target reads spanning breakpoints of contigs.
Eventually, that kind of targeted sequencing will be done on the fly, based on what the flow cell has already processed, Tyson predicts. "You're going to have a list of coordinates in a file and dynamically select what you're going after at a given time."
On Sunday, researchers led by Nick Goldman of EMBL-EBI, and including Loose, posted a preprint on an algorithm that can use an adaptive sequencing strategy on the fly.
"It's a new way of looking at sequencing and I think that's going to be a real shift," Loose said. Though the rest of the tools available to bioinformaticians aren't designed to work with streaming data yet, responsive sequencing "has the potential to be a lot bigger than target enrichment and depletion," he said.
"Dynamically changing what you're sequencing over time is going to be fascinating to explore. And that's going to be a lot of fun to play with," he said.