Skip to main content
Premium Trial:

Request an Annual Quote

Deep Learning Method Helps ID Cancer Drivers Based on Somatic Mutation Rates, Positive Selection

NEW YORK – Researchers from the Massachusetts Institute of Technology, Harvard-MIT Health Sciences and Technology (HST) Program, Broad Institute, and elsewhere have used a computational strategy to find cancer drivers based on somatic mutation rate patterns — work they reported in Nature Biotechnology on Monday.

"[W]e anticipate that deep learning generally, and our tool specifically, can improve computational, experimental, and clinical utility of the growing body of cancer genome sequencing data," co-senior authors Bonnie Berger, a researcher at MIT, the HST Program, and the Broad, and Po-Ru Loh, with the Brigham and Women's Hospital and the Broad, and their colleagues wrote.

"Our mutation maps are publicly available both as an interactive genome browser and as a standalone software tool for quantifying excess somatic mutations anywhere in the genome in a dataset of interest," they added.

With the help of a deep neural network modeling method known as Dig, the team analyzed whole-genome sequence, exome sequence, and targeted sequence data for tumors profiled through the Pan-Cancer Analysis of Whole Genomes effort, together with nucleotide content clues and insights into replication timing, chromatin accessibility, and other epigenetic features drawn from large-scale projects such as ENCODE or the Roadmap Epigenomics effort.

"Our approach's accuracy is attributable, in part, to the ability of the deep learning network to identify local epigenetic structures, such as transcription start sites, and to associate these structures with mutation rates," the authors wrote.

After defining cancer-specific somatic mutation rates at the kilobase scale across the genome in tumors from more than three dozen cancer types, the investigators incorporated probabilistic modeling to focus in on sites under positive selection in cancer to find suspected cancer drivers in coding and noncoding portions of the genome, including those in regulatory elements, cryptic splice sites found in intronic or exonic sequences, untranslated regions, or genes that typically have low or modest mutation rates.

"Dig provides a tool for in silico guidance of in vitro and in vivo studies because it enables prioritization of precise sets of mutations that may act as drivers in both the coding and noncoding genome," the authors explained. "These specific sets of mutations can then be evaluated in experimental systems."

The researchers found that the Dig approach compared favorably to established mutation burden-based strategies for finding known driver genes from genome or exome sequence data, while pinpointing potential driver elements and other sites of positive selection quickly. Based on these and other results, they suggested that "our method matches or exceeds the power of existing approaches while requiring less runtime and providing flexibility to identify drivers with mutation-level precision genome-wide."

Along with analyses centered on known cancer driver genes, the team used Dig to unearth additional candidate drivers, including intronic cryptic splice site changes falling in suspected tumor suppressor genes or in known tumor suppressors such as TP53 or SMAD4 in a dozen cancer types. Likewise, the search led to small insertions or deletions that were overrepresented in the promoter region of TP53, prompting additional analyses of untranslated regions in 106 tumor suppressor genes and nearly 100 oncogenes.

The investigators used a similar strategy to pick up relatively low-frequency driver genes, including tumor suppressor genes and oncogenes, in exome sequenced samples from a subset of cancer types, spelling out "long tail" mutation patterns and highlighting rare loss-of-function genes in drivers such as the DNA mismatch repair genes MSH2 or MLH1.

"Our results represent progress toward an unbiased, pan-cancer catalog of driver genes," the authors suggested, "and suggest that driver mechanisms are shared across the common and rare driver landscape of solid cancers."

Still, they cautioned that computational predictions are a first step in finding driver genes and elements, and cannot be used to confirm a causal role in the absence of more extensive analyses coupled with related functional studies.

"[C]omputational identification of rare driver genes at current sample sizes relies upon small mutation counts, and predictions should be interpreted with care," the study's authors cautioned, adding that "experimental validation is necessary to establish the causal role for a mutation as a driver of cancer."