NEW YORK (GenomeWeb) – Canadian researchers have published the latest proof of principle for a method that they think will help increase the accuracy and decrease the cost of methylation-specific sequencing tests being developed for early cancer detection.
With their method, called cfMeDIP-Seq, (cell-free methylated DNA immunoprecipitation and high-throughput sequencing), the investigators from Princess Margaret Cancer Centre and the University of Toronto believe they have hit on a unique approach not just in sequence analysis, but also at a more fundamental, or technical level.
Although there are now numerous academic groups and companies planning to launch clinical tests in this vein, much work so far has relied on bisulfite conversion methods which are the mainstay of epigenetic sequencing. And various groups have sought to distinguish themselves based on methods for analyzing the resulting data and training algorithms that can pick out cancer cases from non-cancer cases.
Daniel De Carvalho, a senior scientist at Princess Margaret Cancer Centre, and lead author of the study published in Nature last week, said that the problem that methylation solves for liquid biopsy — the needle in the haystack, or signal-to-noise issue posed by the scarcity of recurrent cancer mutations and their low levels compared to background normal DNA — ironically rears its head again because of the nature of traditional bisulfite-based sequencing methods.
"Bisulfite conversion is the gold standard in tissue," he said. "But it’s a chemical modification, and you destroy [a large amount] of the DNA."
That's not a problem with tissue, he said, because you have samples with high fractions of tumor cells. But in a blood sample, especially one in which you are trying to find the first hint of an occult cancer, you have precious little tumor DNA present.
A second issue, De Carvalho said, is that whenever bisulfite conversion fails, it produces noise. You can reduce this by doing a harsher process, but then you lose even more DNA, so you face a tradeoff between reducing background noise and maintaining sensitivity.
He and his colleagues' method also addresses the more basic issue that most of genomic DNA is useless for methylation analysis because it doesn't host any CpG sites, which wastes reads and increases cost.
"You are spending a lot to get a little," De Carvalho said. As a result, while whole-genome bisulfite sequencing is being used by companies like Grail in discovery efforts, the cost and efficiency challenges involved have led other groups and companies to focus on identifying predefined epigenetic signatures that can be assayed using PCR.
The cfMeDIP–seq method is optimized from an existing low-input MeDIP-seq protocol, and retains the open-endedness of sequencing, avoiding the predetermination of PCR, while reducing cost and increasing sensitivity through enrichment of a sample for CpG-rich fragments.
According to De Carvalho, the major development described in the new study is a way to do immunoprecipitation of CPG sites using the low DNA volumes present in a blood sample. This allows them to pull down all methylated DNA, leaving the rest of the genome behind, and then to sequence much more efficiently.
Since the technique works on low-input samples and is cheap compared to whole-genome bisulfite sequencing, you can perform it on biobanked samples, and a lot of them at that, he said. This is what he and his colleagues did in their study, using machine learning to generate a classifier that could distinguish cancer samples from non-cancer samples.
Importantly, he said, it looks like generating a classifier from blood samples themselves more accurately predicts cancer than using methylation signals gleaned from tumor tissue datasets, which some previous tests have done.
In the study, the Toronto team conducted a variety of experiments showing that their approach can detect methylation patterns that distinguish individuals with cancer from those without, including a study of 24 early-stage pancreatic cancer patients and healthy controls, and another examination of 388 samples from seven tumor types.
After training on a test set of samples and then applying the resulting classifier to a separate validation set in this multi-cancer cohort, the authors reported that they could achieve areas under the receiver operating curve of up to 0.98 in distinguishing AML, 0.92 in distinguishing pancreatic cancer, 0.97 for lung cancer, and 0.96 for picking out healthy controls.
"Notably, performance was similar between early- and late-stage samples, suggesting applicability to the detection of early-stage cancers," authors added.
Moving forward, De Carvalho said that the group is open to collaboration with other academic groups and with commercial entities, but has not announced any specific next steps.
On the research side, he said that the team plans to follow up the current study with an analysis of population data from large health study cohorts, which will provide a bridge to future prospective validation.
"Because we can do this with low input, and it's more cost effective, we can access these samples from studies of hundreds of thousands of people followed over many years," he said. "We know which person did develop cancer and which did not, and we can test samples from before their diagnosis," to help refine the specificity of the classifier even further.