Skip to main content
Premium Trial:

Request an Annual Quote

New iCAGES Software Predicts Driver Genes, Drug Response from Patient Cancer Genomes


NEW YORK (GenomeWeb) – Researchers at Columbia University Medical Center and elsewhere have developed a computational tool that they claim is able to rapidly predict genes and variants that are drivers of an individual's cancer and then recommend treatments that are tailored to their tumors.

They describe the so-called Integrated Cancer Genome Score (iCAGES) software in a paper that was published last month in Genome Medicine. The software takes patients' somatic mutation profiles as input and uses this information to prioritize driver mutations, genes, and drugs. According to its developers, iCAGES correctly identified relevant cancer drivers 77 percent of the time when presented with pairs of randomly chosen driver genes and non-driver genes, compared to roughly 51 percent for similar computational tools. 

Unlike existing solutions for cancer driver gene identification, which try to identify genes or mutations from groups of patients and focus on specific genomic regions, iCAGES is designed to analyze data from individual cancer patients, Kai Wang, associate professor of biomedical informatics and director of clinical informatics at CUMC's Institute for Genomic Medicine and the study leader, said in an interview. Wang also developed the widely-used annotation and interpretation software Annovar, a version of which was commercialized by Tute Genomics, now owned by PierianDx, in 2013. 

In addition, "we try to incorporate as much information as possible in our prediction algorithm," such as prior biological knowledge of cancer genes or cancer pathways, he said. on the software will also consider available drugs and clinical trials, as well as molecules that are not yet used clinically but are known to bind and either inhibit or activate specific protein targets.

According to the paper, the iCAGES framework consists of three layers. The first layer takes somatic mutations as input and puts out three types of scores for coding, non-coding, and structural variants — optionally, users can add in copy number profiles as well as gene expression data from patients. In the second layer, the software links these mutation features to genes, using statistical models and machine learning techniques that match variants to databases of known cancer-causing genes and to prioritize the most likely driver genes.

In its final step, the software matches the list of variants to FDA-approved and experimental drug therapies that specifically address those variants or genes, using a three-step process. In the first step, it queries the BioSystems database for neighboring genes and calculates a relatedness score. In the second step, iCAGES groups genes into different categories and then queries the DGIdb database for these different types of genes. It also queries FDA guidelines and clinical trial databases for corresponding targeted drugs in this step. In the final step, the software calculates the joint probability of a given drug being effective for a particular patient and obtains a drug activity score from PubChem.

In one test that was designed to show how iCAGES could work in practice, the researchers used the software to retrospectively analyze raw sequencing data from a patient that had been diagnosed with lung adenocarcinoma. The data was from a previous publication that described how researchers and clinicians prioritized the ARAF gene out of 129 potential cancer drivers gleaned from the patient's data and selected sorafenib as the most effective drug candidate for the patient out of 122 possible drugs.

Using iCAGES, Wang's team successfully arrived at the same result and was able to do so in a much shorter time frame than the previous' study's authors were. Since sorafenib is not currently FDA-approved for treating lung cancer, these results suggest that iCAGES "may help clinicians identify an off-label use of existing drugs for rare indications," he said. This analysis also compared iCAGES to a similar pathogenic gene prioritization tool called Phen-Gen. In contrast to iCAGES, Phen-Gen ranked the ARAF gene in sixth place for the lung adenocarcinoma patient.

In a separate evaluation that looked at iCAGES' drug prioritization capabilities, the researchers compared its performance to at least one other method, dubbed DGIdb, on three test datasets. In one of the cohorts comprising 146 patients, with 22 patients known to have responded to therapy, iCAGES correctly predicted the therapies used in seven out of nine patients dubbed complete responders and excluded therapies that led to progressive disease in seven out of 13 cases. In contrast, DGIdb predicted only one of the drugs used by the complete responders and excluded therapies that led to progressive disease in five of the 13 cases.

The next steps for iCAGES' developers is to plan pilot prospective clinical trials that will evaluate the translational potential of the software. Wang said that he is in discussions with an unnamed research hospital and pharmaceutical company to plan the trials.

One of the trials will compare treatment selection using information from gene panels versus whole genome sequence. "A lot of current cancer precision medicine efforts focus on essentially using a gene panel of 20 or 100 genes to identify somatic mutations from patients and then prescribe drugs based on those somatic mutations," Wang said, but "we demonstrated that the idea of personalizing a treatment strategy based on a [broader] genomic profile will work and in fact can help improve patient survival." The planned trial aims to "reproduce these results in some prospective manner by designing new experiments that investigate whether whole-genome sequencing can really improve patient outcomes compared to just blind treatment or using information only from a gene panel." A second trial, which is currently in the planning stages, will focus on patient outcomes.

In addition, the researchers plan to develop computational methods within iCAGES that will make it possible to integrate different kinds of molecular data, including transcriptome data, copy number alteration data, proteomic data, small RNA data, and methylation data. "I think more sophisticated machine learning [techniques] can help [us] take these heterogeneous datasets and derive a machine learning model that guides us to optimal treatment strategies and decisions," Wang said.