NEW YORK (GenomeWeb) – Researchers have developed an approach to glean whether blood samples contain tumor DNA and in which tissue that tumor, if present, is located.
The approach, called CancerLocator, detects circulating cell-free DNA and uses its genome-wide DNA methylation profile to gauge if it is derived from a tumor and, if so, what tissue it originated from. The University of California, Los Angeles's Jasmine Zhou and her colleagues reported in Genome Biology that their probabilistic method was better able to distinguish cancer and non-cancer samples than random forest and support vector machine classification approaches.
Noninvasive diagnosis of cancer could allow earlier diagnosis, and the earlier the cancer is caught, the better the chance patients have of beating the disease, Zhou said in a statement. "We have developed a computer-driven test that can detect cancer, and also identify the type of cancer, from a single blood sample," she said. "The technology is in its infancy and requires further validation, but the potential benefits to patients are huge."
By drawing on the Cancer Genome Atlas DNA methylation data, Zhou and her colleagues developed a database of methylation markers that are common across cancers as well as ones that are specific to certain tissues, focusing on seven cancers that arise in breast, colon, kidney, liver, and lung tissue. They similarly generated a set of methylation markers that are common to healthy tissues. For their tool, they selected CpG clusters that could differentiate tumor types or healthy plasma.
For a given plasma sample, they generated a methylation profile using whole-genome bisulfite sequencing, which then served as the input into the tool to predict, based on the selected CpG clusters, whether that sample harbors tumor DNA and where it might be from.
The researchers tested this approach on both simulated and real data. On simulated data, which they generated by computationally mixing methylation profiles from a normal plasma cell-free DNA sample and a solid tumor sample — either breast, colon, kidney, liver, or lung — they found that their approach had a Pearson's correlation coefficient of 0.975 between the predicted and true proportions of circulating tumor DNA.
They also compared the performance of CancerLocator to random forest and support vector machine classification approaches using simulated data representing various disease stages. For early-stage cancers, they reported that CancerLocator outperformed both random forest and support vector machine approaches, with respective error rates of 0.067, 0.735 and 0.712. They noted that the random forest and support vector machine approaches did not perform well until the circulating tumor DNA levels exceeded 50 percent.
Similarly, Zhou and her colleagues tested CancerLocator on real data from breast, liver, and lung cancer patients — though their model was developed to distinguish between those and non-cancer as well as colon and kidney tumors — and compared it to random forest and support vector machine approaches. Again they found that their approach outperformed the others, with an error rate of 0.265.
"In general, the higher the fraction of tumor DNAs in blood, the more accurate the program was at producing a diagnostic result," Zhou added. "Therefore, tumors in well-circulated organs, such as the liver or lungs, are easier to diagnose early using this approach than in less-circulated organs such as the breast."