NEW YORK (GenomeWeb) – Independent research teams have published studies outlining the latest advances in efforts to profile DNA modifications with Oxford Nanopore Technologies' MinION instrument, using electrical current patterns and sequence context information from nanopore sequencing reads.
The chemical treatment-free methods for mapping methylated cytosine and/or adenine bases — appearing in two studies in Nature Methods this week — build on techniques that a University of California, Santa Cruz-led team and a group from the Ontario Institute for Cancer Research, the University of Toronto, and Johns Hopkins University described in BioRxiv preprint papers last year.
Both groups have developed strategies that hinge on hidden Markov models (HMM) for learning current signatures related to specific base modifications, noted Benedict Paten, a molecular engineering and genomics researcher and a member of the UCSC research team.
But Paten said the approaches differ in the precise way that they learn. In particular, he and his colleagues use an approach that combines a variable-order HMM with a hierarchical Dirichlet process (HDP) to assess distributions of ionic current in MinION-generated sequences and trace a given current back to a base modification in a specific sequence context.
As they explained in their preprint article last year, the researchers originally attempted to use their HMM-HDP method to distinguish 5-methylated and 5-hydroxymethylated cytosines from unmodified cytosines, primarily in synthetic oligos with known methylation marks that were sequenced with the MinION's R7.3 chemistry.
For the published paper out this week, the team tweaked its approach slightly to suit the latest MinION R9 sequencing chemistry, which bumps up the instrument's accuracy and throughput relative to the R7.3 chemistry available when the original BioRxiv preprint was prepared.
The researchers have now expanded their method to detect a modified form of adenine called N6-methyladenine (6-mA). They also applied their DNA methylation mapping methods for detecting 5-mC and 6-mA across different growth phases in E. coli.
In the original synthetic oligos, that team got a median cytosine methylation calling accuracy of 76 percent for template strand DNA, though the accuracy dipped slightly using reads from the complement strand. The mean cytosine methylation detection accuracy was slightly higher, at 79 percent, in a MinION-sequenced template strand from a pUC19 plasmid grown in E. coli, while adenine methylation calling came in at 70 percent, on average, for the template strand.
After training the model to assess cytosine methylation within particular sequence motifs in E. coli genomic DNA, meanwhile, the researchers applied HMM-HDP for mapping methylation patterns across exponential and stationary growth phases in the bug, uncovering more than 23,000 methylated cytosines at each growth phase considered.
Based on current patterns for MinION-sequenced template strand DNA from E. coli at early exponential, late exponential, or stationary growth phases, meanwhile, they identified between 31,900 and almost 35,000 methylated adenine bases per growth phase.
Paten anticipates that his team's methods — along with the MinION-based methylation detection methods developed by the OICR's Jared Simpson and colleagues — may eventually make base modification detection a routine feature of MinION sequencing.
"There are a really large number of modifications that can happen in DNA," Paten said in an interview. "It's not crazy to think that in a short period of time, the developments we've created and that [OICR's Simpson and his team] have created will trickle into the standard way that you interpret data from the Oxford Nanopore platform."
"You're always going to get out your base call sequences," he predicts, "but you'll have this extra dimension, which is these modification events. We're going to get there by building these large training sets."
From an informatics perspective, the approach used to detect 6-mA is "identical" to that employed for mapping the 5-mC or 5-hmC cytosine modifications, Paten said. "That's the really nice thing about the [Oxford] Nanopore data: we're getting this raw current data that's then processed and it allows us to recognize differences, if they are significantly different from the canonical bases."
He explained that both current signals and sequence motifs — spanning the modified base and its neighbors — are taken into account to accurately predict and discriminate between various base modifications.
Although the absolute differences in current registered between modified and unmodified bases are still relatively small, the way that DNA strands move through the nanopore during sequencing offer multiple snapshots of each base, helping to highlight modifications.
"We essentially see that modification travel through the pore over a duration of time and we're sampling the current over time," Paten said. "As that modification traverses through the pore, we see it at multiple time points and that allows us to compare [overlapping representations of the base] against the background, which gives us more fidelity."
He and his team are continuing to develop and apply their base modification analyses on MinION sequence data and have started tackling some human samples. He expects that it will become relatively routine to glean the most common base modifications such as cytosine methylation in the human genome soon, with "a long tail of generating training data and then learning in all the other different biological contexts that base modifications can occur."
"What we'll need to do is essentially develop a lot of training data that gives accurate profiles of what these different modifications look like and learn them," Paten said. "In principle, all base modifications can be recognized and learned from the methods that we developed and that [Simpson's] group have developed."
The UCSC researchers' HMM-HDP model is found on github, at https://github.com/ArtRand/signalAlign.
OICR's Simpson and colleagues in Canada and the US have been developing and applying their own HMM-based approach to discern 5-mC from unmethylated cytosine bases in the CpG island context.
"At a high level, we're both using a probabilistic model of the underlying signal data to distinguish between, in our case, cytosine and 5-methylcytosine," Simpson said in an interview.
Because the MinION pore changed from the R7.3 to R9 chemistry not long after their BioRxiv preprint came out last year, Simpson explained, members of that team adjusted the structure of their HMM-based model and made other adjustments to the approach to make it compatible with the new nanopore chemistry before repeating prior E. coli and human lymphoblast cell line experiments with the updated version of the pore.
"One of the big things we did after the first round of reviews was to re-do the experiments using the higher-accuracy pore to show that we see the same effect, and we do," Simpson noted, "and then to quantify how much better we call methylated cytosines using the R9 pore versus R7.3."
That team also applied its approach to analyzing methylation patterns in two breast cell lines for its new Nature Methods paper, including a cell line derived from non-tumorigenic epithelial breast tissue and an aggressive metastatic breast cancer line.
The two cell lines were subjected to reduced representation MinION sequencing with the R7.3 chemistry and to more traditional Illumina bisulfite sequencing to compare the approaches and do more high-depth calling of methylated cytosines.
When they focused on regions with relatively high MinION sequence coverage, the authors noted that "data from the bisulfite-based and nanopore-based sequencing approaches largely showed the same trends in the amount of methylation when accounting for coverage levels."
The OICR-led team is making its own 5-mC analysis pipeline freely available on github. In addition to analyses of new MinION sequences, the model should also be retroactively applied to find 5-mC modifications in the CpG context using existing MinION current data, Simpson noted, though further training would be needed to try to assess other types of methylation.
He and his colleagues plan to continue applying their approach to samples with higher-depth sequencing data and to MinION-based sequences from cancer genomes. They are also working on phasing the methylation profiles produced with the MinION sequences and will likely expand the scope of base modifications that their model considers going forward.
"In future work, we plan to generate more extensive training sets for all of the possible methylation combinations, including non-CpG methylation," Simpson and his co-authors wrote.