NEW YORK – Researchers who have detected DNA N6-methyladenine (6mA) in plant and animal samples may want to take another, closer look. High levels of modifications may be largely due to contamination, either from bacteria in the organism's microbiome or from plasmids introduced by common tools used in molecular biology.
That's the message from Gang Fang, a researcher at the Icahn School of Medicine at Mount Sinai, whose team published data earlier this month using 6mAScope, a new long-read sequencing-based strategy to identify the source of 6mA in several types of samples, including one-celled eukaryotes, Drosophila, Arabidopsis, and human cell lines and tissues.
"We can only talk about our own samples that we examined," he said. While the study found that 6mA levels in green algae and protists were attributable to those organisms' genomes, in insects and plants more than 90 percent of detected 6mA was linked to likely sources of contamination. "But we want to alert people to pay more attention or maybe revisit some of the earlier reports, especially for multicellular eukaryotes," he said.
The paper, published in Science, is the latest in a string of papers that have cast doubt on earlier reports of elevated and differentiated levels of 6mA in multicellular organisms.
"Some of the techniques we use to detect 6mA are prone to errors," said Eric Greer, an epigenetics researcher at Children's Hospital Boston and Harvard Medical School, who wrote a perspective piece in Science accompanying Fang's paper. Most studies on this topic use either high-performance liquid chromatography with mass spectrometry or long-read sequencing — particularly from Pacific Biosciences. "Both of those techniques have inherent problems that researchers weren't aware of," he said. Groups that have adjusted their methods to account for these problems "are getting a more accurate quantification of 6mA," he said.
But many researchers espouse the view that metazoan genomes can harbor significant levels of directed 6mA modifications. The study is "carefully done, but there are a few things that need to be clarified," said Andrew Xiao, a researcher at Yale University who counts himself among the crowd who think 6mA levels are higher than suggested by the new paper. "There are subtleties in purifying mammalian DNA and cell types which may lead to a different conclusion."
The study did not find high levels of bacterial contamination in its human samples. And limitations of the PacBio-based approach mean it may not be able to identify 6mA in genomic regions with secondary DNA structures, which are thought to harbor many of those modifications in mammalian genomes.
The camps do agree on at least one thing: Only new sequencing methods with single-base resolution are likely to resolve the debate, especially in humans. Luckily, those methods, namely chemical conversions similar to the bisulfite sequencing approach for methylation, have arrived.
The start of the debate can be traced back to around 2015, when Greer's team reported finding 6mA in Caenorhabditis elegans genomes, and Dahua Chen of the Chinese Academy of Sciences Institute of Zoology reported it in Drosophila. Multiple labs followed suit and a raft of papers released surprising results about 6mA levels in animals and plants. Mass spec and long-read sequencing methods have suggested that 6mA — already established as being abundant in bacteria and influential to genome regulation — could exist at levels of hundreds of parts per million, or more, in model systems and human cells.
While much less frequent than important marks such as 5-methylcytosine — which occurs at about 4 percent of all cytosines in the human genome and is used in several approaches to liquid biopsy testing — the studies raised hopes that 6mA could be an important biomarker for human development and disease.
Xiao said his best guess for 6mA levels in human glioma stem cells was in the range of 10 to 100 ppm. "Sometimes it's higher," he said. "In certain diseases, we see it goes up, sometimes by a factor of 10."
But in 2017, a German research team reported finding no evidence of 6mA in mouse cells and tissues. Researchers, including Greer, began looking for sources of experimental error and methods to quantify them. Mass spec, while analytically precise, would have no way to discriminate between 6mA from the genome of interest and a contaminating bacterial genome, Greer said. In 2019, his lab published a study identifying contamination from microbes as well as the enzymes used to digest and purify samples, which are usually purified from bacteria.
Based on his paper, Greer said his best guess for the "true" level of 6mA in human cells was about 1 or 2 ppm. Fang's new sequencing-based method provides a similar estimate, Greer said.
In parallel with mass spec, researchers also used sequencing-based methods to detect 6mA. Some approaches used antibodies to pull down sequences with 6mA while others used PacBio's ability to detect epigenetic modifications based on differences in base incorporation by the polymerase in its sequencing chemistry. (Oxford Nanopore Technologies' platform has been shown to detect 6mA from raw signal, even in Drosophila; however, the firm is still working to officially add that ability to its base-calling software.)
With experience detecting 6mA in bacteria using PacBio, Fang believed his lab was well equipped to try to replicate the initial high-level results. They immediately had success finding 6mA in green algae. "But when we studied other organisms, we had some doubts," he said.
PacBio's method, highly effective at finding 6mA in bacterial genomes, led to certain blind spots when turned to plant and animal genomes. To detect DNA modifications, PacBio relies on a quirk of its sequencing chemistry: hiccups in base incorporation at epigenetically modified bases (and at other nearby bases) show up as differences in time between fluorescence pulses, called the interpulse duration (IPD). PacBio's algorithm uses these differences in IPD to make the epigenetic modification calls.
When 6mA is abundant, as it is in bacteria, the modification-calling algorithm performs fine, Fang and Greer suggested. But simply porting the method to other samples, where 6mA is rarer, may have led to false positives. "To find very rare things, it's very challenging to avoid false positive calls," Fang said.
PacBio's platform can't distinguish between 6mA and 1mA, a DNA damage mark, Greer said, which could have introduced more error. And until the recent introduction of the HiFi sequencing protocol, PacBio's long reads were error-prone, with phred scores topping out around Q20, or about a 1 percent error rate. Lastly, and perhaps most importantly, the approach requires a comparison to a reference genome — the genome of interest. How exactly that comparison is made isn't clear. According to a PacBio white paper, IPD ratios can be calculated either using amplified controls or in silico controls. PacBio declined to respond to a list of detailed questions about its modified base detection methods.
"While our original base modification detection workflow measured if a base was methylated after alignment of the reads to a reference genome, the work here shows that methylation can be detected on single DNA molecules before reference alignments and sequence identification, which simplifies the analysis and opens many exciting opportunities such as the one described here," PacBio CSO Jonas Korlach said in a statement. "By detecting 6mA on individual DNA molecules, without the need for chemical conversions, with PacBio's highly accurate HiFi sequence reads to simultaneously inform about their molecular identity, the study represents another example that demonstrates the power of HiFi sequencing to provide deeper insights into the complex biology of genomes and epigenomes."
Fang's team sought to address two potential issues with PacBio's method. First, they boosted the accuracy by using circular consensus sequencing of short, 200- to 400-bp sequences. Second, these sequences were not assumed to be part of the primary genome of interest; they were also compared with potential sources of contamination. Using a machine learning algorithm, the model assigned each sequence to a source. Only then were 6mA levels quantified for each species. Results were validated by mass spec.
For multi-celled organisms, the results were stark. Less than 2 percent of 6mA in Drosophila samples were attributable to that genome. Instead, 6mAScope suggests the modifications came from gut microbes. For Arabidopsis, more than 95 percent of 6mA came from soil bacteria, Fang said.
Given that bacteria have potentially 1,000 times more 6mA than multicellular organisms, even a little contamination could have a big effect on results, he suggested.
The study's human samples were not contaminated by bacteria, Fang said. "But we also didn't find high levels of 6mA. We cannot say if previous findings were due to contamination." In the paper, his team suggested that some of the 6mA in human samples was explained by plasmid or Escherichia coli genomic DNA.
Xiao, Greer, and even Fang were clear that the new paper has several limitations. Fang stressed that his results applied only to the specific sample types he analyzed for the paper. He wouldn't speculate about other sample types. "Because we did not have access to samples, we don't mean that the previous studies are wrong," he said.
Xiao noted that the study analyzed glioblastoma tissues, and not specifically glioblastoma stem cells. "These are a very small population from the glioblastoma tissue," he said. "If you grind up the whole tissue and analyze with mass spec and sequencing, I don't think they'll show up very dramatically."
"You do not have an apples-to-apples comparison here," he said.
What's more, 6mAScope may have its own blind spots. In bacteria, 6mA is associated with a "GATC" motif in double-stranded DNA, Xiao said. "Mammalian 6mA doesn't seem to have a motif and it occurs in the regions with lots of DNA secondary structures, which are challenging for SMRT-seq."
In addition to potentially missing 6mA in secondary structures, the control for 6mAScope in mammalian cells involves overexpressing a bacterial methyltransferase "which only modifies the 'GATC' sequence," Xiao said. "The detection limit of 6mAScope in mammalian genomes is based on a motif-driven model in dsDNA conformation; therefore, it is of interest to see if this model can be extrapolated to non-motif-driven regions enriched for DNA secondary structures."
Fang responded that bacterial 6mA events do not only occur at "GATC" sites. "I think this is a general misunderstanding in the eukaryotic epigenetics field," he said. "Instead 6mA events are on a large diversity of sequence contexts. In our paper, we demonstrated that 6mA has well-defined signature in SMRT sequencing across very rich sequence contexts." He noted that the study included additional controls besides those with overexpressed methyltransferase, where 6mA was added to additional motifs.
Greer noted that while Fang's team looked at mitochondrial DNA and found little evidence of abundant 6mA, they weren't looking under stress conditions. The modifications could be induced under certain biological conditions.
And for samples with low 6mA abundance, it can only provide data on the overall quantity of modifications, and not their specific locations (which it can do for bacteria and unicellular eukaryotes, Fang noted).
Both Greer and Xiao suggested that this debate over 6mA will persist until researchers can map the modifications with single-base resolution. Aside from the absolute levels of 6mA modifications, a major question is whether they're random or not.
"We don't know where these sites are, if they're scattered through the genome, or concentrated on critical regulatory regions," Xiao said. "Only something equivalent to bisulfite sequencing is going to solve that issue. That's the most critical issue in the field right now."
Thankfully, chemical conversion sample prep methods are arriving. One of the most promising ones, Nitrite-seq, involves a chemical conversion similar to bisulfite sequencing for methylation. Developed by researchers at Canada's York University, it converts all adenines — except 6mA — to guanine using sodium nitrite under acidic conditions. Greer said his lab has conceived of another possible chemistry but has not yet developed it.
Xiao noted that conversion would enable analysis of both single- and double-stranded DNA.
Chemical conversion is "going to help me feel much more confident as to where this base is actually occurring," Greer said.