NEW YORK (GenomeWeb) – Two research groups have independently developed methods to use Oxford Nanopore Technologies' MinIon nanopore sequencer to identify methylated cytosine bases.
In two papers recently posted to the BioRxiv preprint server — by a team from the Ontario Institute for Cancer Research and Johns Hopkins University and another group from the University of California, Santa Cruz — the researchers demonstrated that statistical models applied to the ionic current generated from passing DNA through the MinIon nanopore could distinguish certain methylation marks without requiring any chemical treatment of the DNA. [IP is actually not a chemical treatment – it just pulls down DNA with an antibody].
Previously, researchers had demonstrated experimentally that nanopore sequencing could identify base modifications, but currently, no methods are available to routinely call methylation marks as part of the base calling process for sequencing on the MinIon.
In the pre-print study published by the OICR/Johns Hopkins team, the researchers used a hidden Markov model to calculate the probability that a cytosine within the context of a CpG island would be methylated.
The team first validated their algorithm on PCR amplicons from Escherichia coli strains with no methylation events, comparing their results to Oxford Nanopore's reference parameters. Next, they induced methylation on the same E. coli PCR amplicons, converting cytosine in a CpG context to 5-methylcytosine, and validated that the sites were in fact methylated using bisulfite sequencing. They then sequenced the amplicons on the MinIon and aligned the data to the reference parameters. Because of the methylation, there were many k-mers that differed significantly from the reference models.
The team then sought to develop an algorithm that would be able to identify those differences. They did this by creating a new "alphabet," in which base calling identified not only A, C, G, and T, but also 5-methylcytosine within CpG islands, which they classified as M.
Finally, the group tested its model on a human lymphoblast cell line with known methylation along with two controls — one methylation negative and one methylation positive.
While the model was able to detect 5-mC in the context of CpGs, it was not able to detect methylation outside of that context or identify k-mers that had a mix of methylated and nonmethylated CpGs.
To determine accuracy, the researchers randomly sampled 100,000 singleton sites from both the positive and negative control datasets, determining that their results had an accuracy of 82 percent. By increasing the stringency for making a call, they were able to increase accuracy of those calls to 95 percent, but made fewer total calls.
Similarly, the UCSC team also used a hidden Markov model (HMM) to detect methylation, but combined it with a hierarchical Dirichlet process mixture model (HDM) — a method that "shares statistical strength to robustly estimate a set of complex distributions," the authors wrote. They found that incorporating HDM "enhances HMM's ability to detect cytosine variants." In addition, they developed their tool to be able to distinguish both 5-methylcyotosine and 5-hydroxymethylcytosine.
To test their model, the UCSC team generated synthetic DNA strands composed of cytosine, 5-mC, or 5-hmC. After sequencing, they aligned the template and complement strands to a reference sequence and then used their HMM-HDM model to classify each strand. When comparing all three, they were able to classify cytosines with a mean accuracy of 76 percent for the template strands and 70 percent for the complement strand. When just calling methylated versus unmethylated cytosines, median accuracy increased to 83 percent and 78 percent for template and complementary strands, respectively.
Both groups found that accuracy varied based on sequence context. The OICR/Johns Hopkins team found that methylation was most likely to be detected when it occurred at the fifth or sixth position in the 6-mer rather than the first position. The UCSC team also noted that "some modified cytosines are readily captured while others are not discernible."
Both groups also wrote that their methods could be extended to identify other modified bases. "It seems likely that the general trend of detection via modulation of the nanopore current will hold true in any type of pore-based sequencing," the authors of the OICR/Johns Hopkins study wrote. In addition, "general models of other DNA damage, resulting from heavy metals, oxidation, UV damage, or other alterations might be detected in natural DNA with this method," they wrote.