NEW YORK – Researchers led by a team at the Chinese University of Hong Kong (CUHK) have developed a new method that greatly improves the detection of DNA cytosine methylation using Pacific Biosciences single-molecule real-time sequencing data.
The approach, called the holistic kinetic (HK) model and described in a paper in PNAS on Monday, considers kinetic signals from the DNA polymerase used in PacBio sequencing, as well as the sequence context, to determine cytosine methylation sites. After training a methylation classification model using a convolutional neural network, the researchers reported being able to detect cytosine methylation, or 5-methyl-C, with 90 percent specificity and 94 percent sensitivity.
"The advantage of the method described in our PNAS paper is that it allows simultaneous determination of DNA [sequence] and CpG methylation in one go, with high base-calling accuracy, and without the use of a step that is destructive to the input DNA," said corresponding author Dennis Lo, director of the Li Ka Shing Institute of Health Sciences at CUHK, in an email. Another advantage is that long-range methylation haplotypes can be determined, he said, but a limitation is the current throughput of the PacBio sequencing platform.
The new method complements existing techniques for cytosine methylation detection, including bisulfite sequencing and nanopore sequencing from Oxford Nanopore Technologies. It also adds a new capability to PacBio's sequencing technology, which has so far been lacking when it came to direct 5-methyl-C detection.
According to the authors, potential applications of the patent-pending method include epigenetic studies in humans and other organisms but also molecular diagnostics. Take2, a Hong Kong-based startup cofounded by Lo that has developed a screening test for nasopharyngeal cancer, has taken an exclusive license to the technology.
"I look at it with cautious optimism," said Winston Timp, an associate professor of biomedical engineering at Johns Hopkins University who has worked extensively on DNA methylation detection, in particular using nanopore sequencing. "If they can keep pushing on it and develop it, and the tool is made available, and others can reproduce these results, it's pretty exciting."
Timp said PacBio's technology has already been very adept at directly calling another type of DNA methylation, 6-methyl-adenine, but much less accurate for direct 5-methyl-cytosine detection.
"We welcome the efforts of the research community, either independently or collaboratively, to improve all aspects of [single-molecule real-time] sequencing performance and applications development, of which this study is a very nice example," said Jonas Korlach, CSO of Pacific Biosciences, in an email.
Cytosine methylation often occurs in CpG dinucleotides, frequently found in strings of so-called CpG islands, and plays an important role in gene regulation. A well-established method for measuring 5-methyl-C is bisulfite sequencing, where the DNA is treated with bisulfite to convert unmethylated cytosines to uracil and is then sequenced, mostly with short-read technologies like Illumina's but also with long-read platforms from Pacific Biosciences or Oxford Nanopore.
But a problem with bisulfite sequencing is that the harsh chemical conversion step tends to degrade the input DNA, so it requires relatively large amounts of starting material, which is not always available. Also, short reads are unable to generate methylation data over larger distances in the genome to generate methylation haplotypes. "The technology described in our PNAS paper addresses both of these problems in one stroke," Lo said.
Another issue with bisulfite sequencing, he said, is that it can be difficult to distinguish a genuine T that is based on a polymorphism from a T that resulted from the bisulfite conversion of an unmethylated C.
Nanopore sequencing has also been used to directly detect 5-methyl-C. Timp's group, for example, demonstrated this in a paper published in Nature Methods in 2017. New analysis methods for Oxford Nanopore data, compared in a recent preprint, have improved the accuracy of direct cytosine methylation detection, Timp said, in particular a tool called Megalodon. The accuracy is broadly similar to what Lo's group reported in its paper, he added.
But Oxford Nanopore's base calling accuracy is still "significantly worse" than that of PacBio or Illumina sequencing, Lo said, though an advantage of nanopore sequencing is that the hardware is "relatively inexpensive" compared to those other platforms.
Timp said it's "a completely fair point" that nanopore data has a lot of insertion and deletion errors, in particular compared to PacBio's HiFi reads, which reach high accuracy by sequencing the same DNA molecule several times over. "However, it depends on what your application is, whether this would matter to you," he said. In addition, nanopore base accuracy has been improving with newer base callers. "It's not at the level of HiFi, but from a single pass, they're still getting pretty high accuracy," he said.
It is unclear why Pacific Biosciences has not commercialized a method of its own yet for highly accurate direct 5-methyl-cytosine detection. The company declined to comment on its internal efforts at this time. Lo said he is not aware of any peer-reviewed publications that describe PacBio-based genome-wide CpG-site methylation analysis with similar results to those of his own group.
Timp also said he has not seen any work on highly accurate direct 5-methyl-C detection with PacBio's platform. He speculated that methylation sequencing may just not have been a commercial priority for the company. "PacBio has been focusing, as has Oxford Nanopore to some extent, on accuracy and read length and yield," he said. "Base modifications, although I love them, and many people do love them, are not, let's say, the major market compared to some of these other things."
Lo said his team plans to make the software available to academic researchers who would like to replicate his group's work, adding that those interested in gaining access should contact him directly.
Take2, having taken an exclusive license to the methodology, will be in charge of its commercialization. "We believe that this is a powerful platform for epigenomic analysis and would help Take2 fulfill its mission as a health informatics company," Lo said, though he declined to provide further information about the commercialization pathway at this time. The company recently had a soft launch of its nasopharyngeal carcinoma screening test, he said, but two years ago, he indicated that the firm's future would lie in health informatics.
One potential diagnostic application of the approach could be to determine the tissue of origin for cancers of unknown origin, he said. This would involve analyzing long-range methylation haplotype information in tumor samples from surgery or biopsy.
For the scientific community, it will be a definite plus to have not one but two long-read-sequencing 5-methyl-cytosine detection methods available, Timp said. For example, the new PacBio method could be used for combined methylation and chromatin accessibility studies, similar to the Nanopore sequencing of Nucleosome Occupancy and Methylome (NanoNOMe) approach his team published last year.
"Depending on your application, you could do HiFi or you could do nanopore, and it's good to have more tools," he said. "I would not necessarily say that this is going to blow nanopore [sequencing] out of the water, but at the same time, this allows PacBio's methylation [sequencing] to reenter the field."