This article has been updated to correct an error describing the labs conducting the research.
NEW YORK – Investigators from Stanford University have developed a new method for analyzing circulating cell-free DNA fragmentation patterns, which they believe could expand upon various epigenetic methods being developed for early cancer detection to offer clues about the expression of individual genes and gene panels without analyzing actual mutations.
Presenting the method, called EPIC-Seq (Epigenetic Profile Inference from CfDNA Deep Sequencing), at this week's second virtual session of the American Association of Cancer Research's annual meeting, Stanford research fellow Mohammad Esfahani described his team's experiments applying the technique to tumor subtyping.
According to Esfahani, the group's experiments with EPIC-Seq so far suggest that the method can accurately predict tissue- and tumor-specific gene expression in plasma samples from healthy individuals and cancer patients, without knowledge of specific gene mutations or sequencing of expression products.
During a discussion of the AACR session, Memorial Sloane Kettering physician scientist Jorge Reis-Filho called the approach "immensely exciting" for its potential to allow the use of epigenetic information to characterize the biology of cancers in addition to merely detecting them.
Esfahani said that EPIC-Seq follows upon on earlier methods demonstrating that it is possible to detect cancer and infer the origin of circulating DNA molecules in the body based on fragmentation patterns and nucleosome positioning, as well as other studies exploring the relationship between nucleosome positions and gene expression.
But even with these discoveries, it hasn't yet been clear that epigenetic information from cell-free DNA could robustly predict RNA expression for individual genes in a way that might address clinical challenges, like cancer subtyping and classification, he argued.
Speaking during the AACR presentation, Esfahani said that previous research by other groups had already shown that nucleosome positioning correlates with gene expression, but making such approaches work for specific applications of interest would require the presence of impractically high levels of circulating tumor DNA in a patient sample.
"We asked, 'are there other sources of information that can be used … like fragment length … that can improve gene expression inferences?'" he said.
The basis of the EPIC-Seq approach was a hypothesis that less protected nucleosomes should yield a more diverse population of DNA fragment lengths, because these regions have a higher chance of being cut randomly, Esfahani explained. Therefore, a measure of DNA fragment size distribution, or diversity, could be a surrogate for the nucleosomal factors previously determined to predict gene expression.
To develop EPIC-Seq, Esfahani and colleagues in the labs of Stanford's Max Diehn and Ash Alizadeh performed 250X genome-wide plasma DNA sequencing on a range of samples, comparing fragment patterns with genome-wide expression to define the relationship between the two. They validated the results in independent whole-genome sequencing datasets from about 1,000 samples.
They used this data to design a targeted sequencing approach for inferring gene expression in a panel of 176 genes with potential clinical relevance and applied it to a cohort of 79 control subjects, 73 non-small cell lung cancer samples, and 92 diffuse large B-cell lymphoma cases, evaluating performance for both cancer detection and biological subclassification.
Using internal cross validations, the group was able to achieve an area under the receiver operating curve (AUC) of .91 for distinguishing lung cancers from controls using EPIC-Seq, and an AUC of .89 for subclassifying lung cancers as either adenocarcinoma or squamous cell carcinoma.
For DLBCL, the AUC for cancer detection was again .91, with a clear recapitulation of the same cell-of-origin subtypes that would be determined based on mutation and copy number information. The method also appeared to make accurate subtype-specific prognoses, with the two EPIC-Seq-defined subgroups differing clearly in their progression-free survival.
The group now needs to follow up with validation in independent samples, rather than an internal cross validation, to see if the predictive power suggested by their initial AUC calculations can hold up. That said, Esfahani said that the fact that EPIC-Seq scores were so nicely correlated with both clinical factors and mutation-based subtype determinations bodes well for the approach.
And the data establishes, at the very least, that this novel feature — the entropy of cfDNA fragment size distribution — is significantly correlated with gene expression, he said.