This article has been updated to correct the spelling of Biomodal and to include additional information from the company.
MINNEAPOLIS – Long-read sequencing can match — or even exceed — the performance of short-read conversion-based methylation analysis, two new benchmarking studies show.
On Monday, representatives of the Association for Biomolecular Resource Facilities (ABRF) DNA sequencing research group presented data from their comparison of five approaches for detecting 5-methylcytosine in CpG sites: the Zymo Research, IDT, and New England Biolabs conversion-based sample preps with sequencing on the Illumina NovaSeq X instrument, as well as native, single-molecule methylation analysis offered by long-read methods from Pacific Biosciences and Oxford Nanopore Technologies.
"Across the different samples and different sites, it all looks very good," said Xiaoyu Zhuo, a bioinformatician at Washington University in St. Louis who led the ABRF benchmarking study with Molly Zeller, lab manager at the University of Wisconsin-Madison Biotechnology Center sequencing core facility.
That matches results published last month by researchers at DeCode Genetics in Genome Biology, which compared PacBio and Oxford Nanopore's methods with bisulfite sequencing from Biomodal, formerly known as Cambridge Epigenetix.
A spokesperson from Biomodal pointed out that its bisulfite sequencing product used in the paper is a legacy product the company has not sold directly or manufactured "for some time" as it has moved entirely to an enzymatic approach.
"The methylation predictions from all methods are highly correlated and consistent," the authors wrote. "They all replicate known 5-mCpG distributions in the human genome, such as the lack of 5-mCpG in promoter sequences."
"I'm happy they're doing it; it's very important work," said Claudia Lalancette of the University of Michigan on the sidelines of the ABRF annual meeting. She is the managing director of an epigenomics core lab and was not involved in the study, though she was involved in a previous benchmarking study, called Sequencing Quality Control 2 (SEQC2), which explored methylation sequencing. "Long-read [methylation] sequencing is important because in some research, especially cancer, you want to see if there's a genetic modification close to your change in methylation. That's a struggle to do for short reads." She noted that the SEQC2 study did not include PacBio data.
The results should boost confidence in long-read methods, which offer certain technical advantages over short reads. "ONT and PacBio will outperform bisulfite sequencing in complicated regions, such as dark regions. Additionally, the phasing is more accurate and better suited for allele-specific methylation studies," Brynja Sigurpálsdóttir, a bioinformatician at DeCode Genetics and first author of the Genome Biology paper, said in an email. Her paper noted that long reads detected about 3 percent more methylated CpG sites than bisulfite sequencing.
On long-read platforms, methylation information also "comes 'free' with the sequencing, so companies such as DeCode, which were not too interested in methylation, started doing it on a large scale because we were sequencing on a large scale anyway," she added, noting that bisulfite sequencing will still be cheaper for those only interested in methylation analysis.
The benchmarking studies offer some of the largest datasets to date comparing the various approaches to methylation analysis by sequencing.
"We bit off a lot," Zeller said. The ABRF study analyzed four DNA samples — fully methylated and unmethylated controls as well as the 12878 human cell line and the MCF7 human breast cancer cell line — prepared in five ways across 10 different sites. They also analyzed the samples using Illumina microarrays. All sequencing was performed at WashU.
Zeller noted that the study was done with $102,000 of in-kind support from vendors.
The DeCode study contained 132 human DNA samples that were sequenced using both nanopore and bisulfite methods. They analyzed an additional 50 whole-blood samples using PacBio sequencing.
The comparisons revealed several quirks of the various methods. The ABRF study, for example, found "spurious" non-methylation calls in some bisulfite methods that align reads to a reference genome. "A CG to TG mutation in the sample will look like unmethylated cytosine," Zhuo said, while the PacBio instrument will ignore it. He later noted that this is a known issue and that there are several strategies for dealing with it, including variant calling prior to methylation calling.
These spurious calls could affect downstream applications. "Suppose the methylation detection algorithm is forced to output the methylation ratio for each CpG in the genome," Sigurpálsdóttir said, though her team has not yet looked for this pattern in their data. "The methylation estimates will be reported for many positions where the C has mutated to a T. This can lead to false interpretations of methylation patterns and incorrect methylation calls."
Some cancers, such as those found in the head and neck, have very high rates of C to T SNPs due to the activity of APOBEC, a deaminase, Lalancette noted.
Zhuo also noted that their data showed partial methylation of the unmethylated standard from Zymo. "I think Zymo knows about this now," he said. The firm did not immediately respond to a request for comment.
The DeCode study reported some strand bias in Oxford Nanopore data, meaning there was a difference in the estimated methylation rates of the forward and reverse strands of DNA. "We see this mainly in the early days of ONT data, and it has improved in data sequenced using R10.4 flow cells and more recent versions of [the] Guppy/Dorado [basecallers]," Sigurpálsdóttir said. "We did not measure the strand bias in PacBio data as we did not have strand information available per read."
Sigurpálsdóttir said that even though 20X coverage is recommended with Oxford Nanopore, "there is still high consistency between 10X to 20X and [bisulfite] data," especially with updated flow cells and improved detection algorithms. "[Early Oxford Nanopore] data is still highly consistent and can be filtered for enhanced accuracy," she said.
Zhuo said the ABRF study has some directions it would like to continue in, including phasing and allele-specific methylation detection. It also wants to look for methylation in repeats and other hard-to-map genomic regions and at the differentially methylated calls between methods.
Lalancette said she hopes that the ABRF group will include additional methylation sequencing sample prep kits, such as those from Biomodal, especially kits that can detect 5-hydroxymethylcytosine, another important biomarker. "What they've provided is a template to move forward," she said.