BALTIMORE – As whole-genome sequencing (WGS) is increasingly used for clinical applications, Hartwig Medical Foundation is promoting its use for routine cancer diagnostics in the Netherlands.
The nonprofit biomedical institute based in Amsterdam has developed a fully automated cancer WGS data analysis and curation pipeline and is providing a clinical-grade WGS workflow as a diagnostic service to hospitals in the Netherlands. Additionally, it is collecting cancer WGS data to discover new cancer biomarkers, help predict cancer tissue of origin, and augment future cancer care.
In a presentation at the Advances in Genome Biology and Technology (AGBT) annual meeting last week, Edwin Cuppen, Hartwig’s scientific director, told the audience that the challenge for WGS for routine cancer diagnostics is not data generation anymore. Instead, it is “how do we get from these vast amounts of whole-genome sequencing data to information that could be relevant for medical specialists, clinical actionability, and … cancer patients,” he said.
To tackle this challenge, his team at Hartwig started to build an integrated data analysis and curation platform roughly five years ago, synergizing both existing open-source tools and in-house developed software to automated and streamline cancer WGS data curation.
According to Cuppen, the pipeline, with all of its tools available in GitHub, can be used for a variety of data input, including WGS tumor-normal data, WGS tumor only data, as well as tumor panel and whole-exome sequencing data. “The advantage of that is you only have to maintain a single pipeline, independent of what type of application you use for your data generation,” he said.
His team implemented the pipeline in a Google Cloud-based computing environment, which Cuppen said affords several advantages. For one, he said cloud computing’s ability to parallelize multiple tasks with adjustable virtual machine size enables streamlined task scheduling. On top of that, he said the researchers can make use of the so-called pre-emptible VMs — basically leftover computing capacity in the cloud — while only paying about 20 percent of the normal cloud computing price.
Of course, data security is a big consideration for cloud computing, he said. To that, Cuppen noted his team engineered the pipeline “in such a way that Google can see the bits and bytes, but they cannot see our data.”
Finally, with optimization, his team was able to run the pipeline in cloud from FastQ files to patient reports in only 15 hours, bringing the cost for cloud computing to less than $40. Meanwhile, the fully automated pipeline requires no manual intervention steps from sequencer to report, and no third-party hardware. Importantly, the pipeline’s turnaround time and costs are also independent of the sample size, which is important for routine diagnostic use because hospitals can predict turnaround time, Cuppen said.
Coming out of the pipeline, the WGS data will be translated into a paper report that summarizes all the clinically relevant information, including not only genetic variants, but also cancer interpretation that contains actionable standard of care, therapy guidance, and clinical trial options, according to Cuppen.
Hartwig is also currently providing a clinical-grade WGS sample-to-report workflow as a diagnostic service to hospitals in the Netherlands. The cost for each patient is between €2,500 to €3,000 (about $2,630 to $3,156), which includes four 30X genomes sequenced primarily using Illumina NovaSeq 6000 systems, Cuppen noted after his presentation. Using a recent project as an example, Cuppen showed that the whole process from sample drop-off to returning a clinical report generally takes four to 14 days.
To demonstrate the feasibility of WGS for routine cancer diagnostics, Hartwig recently conducted a study of 1,200 patients, with about one-half of the participants undergoing WGS and the other half receiving standard care.
Presenting unpublished data, Cuppen showed that overall, WGS was successful in leading to a diagnosis in 70 percent of the samples while the success rate for standard of care was 86 percent. “So there is a discrepancy here,” Cuppen said. But 80 percent of the failure in WGS was due to low tumor purity, which he said can be a “bottleneck” for cancer WGS.
Additionally, the data showed that WGS was also comparable to standard of care in terms of false positives and negatives. As for actionability, 14 percent of all patients started a targeted treatment based on a WGS-only diagnosis, while standard of care was 11 percent.
All things considered, Cuppen said from a clinical perspective the data from this study is “definitely meeting the quality standards that you need for transferring to a new application of technology.”
Not only is WGS valuable for routine cancer diagnostics, but it is also important for cancer research, Cuppen said. In that regard, Hartwig has been building a cancer WGS database and consenting cancer patients during routine care for their permission to use their data for research.
According to Cuppen, to date, the database contains WGS data for more than 5,100 patients and is available for free for academic use. In addition to whole-genome data, the database also includes RNA sequencing data for about half of the patients, though it is not used in routine diagnostics.
Lastly, Cuppen’s team sought to use the database to develop a machine learning pipeline to predict tissue of origin for tumors.
“About 3 to 5 percent of patients come to the hospital actually already embedded in metastatic phase without a primary tumor ever found,” Cuppen said. “Basically, there are no treatments possible because virtually every treatment [has] a label for a specific tumor type.”
To that initiative, his team trained the algorithm with data from about 7,000 cancer patients, combining both Hartwig’s database and legacy cohorts on both primary and metastatic cancer. Currently, the pipeline, named Cancer of Unknown Primary Location Resolver, or CUPLR, covers 35 tumor types, and a yet-published study on more than 500 independent patient samples showed that the pipeline has a more than 90 percent precision, Cuppen said.
Additionally, a recent study conducted by his group showed that among the 72 cancer patients with no primary tumor found, the algorithm was able to help reach diagnosis for 37 cases, and in 12 cases, the whole-genome sequencing data was informative enough for a final diagnosis.