NEW YORK (GenomeWeb) – Scientists from Singapore's Agency for Science, Technology and Research (A*Star) have developed software that sifts through multiple omics datasets to better identify candidate mutations that are likely play a role in tumor development and survival and could serve as therapeutic targets for new treatments.
In a recent issue of Nucleic Acids Research, the researchers at A*Star's Genome Institute of Singapore (GIS) who developed the so-called OncoIMPACT software describe it as a "data-integration framework to nominate patient-specific driver genes based on their phenotypic impact."
The software, they noted, offered "notable improvements" in terms of precision and robustness over existing approaches when used to explore more than 1,000 melanoma, glioblastoma, prostate, bladder, and ovarian cancer datasets from the Cancer Genome Atlas, and also outperformed existing methods when it was applied to data from cancer cell lines. Furthermore, researchers were able to use the mutational profiles that were generated by OncoIMPACT to stratify patients according to survival outcomes, demonstrating for the first time, they claim, "the use of a set of computationally identified driver genes as a mutational-status-based signature for tumor stratification and prognostication."
OncoIMPACT is the fruit of a nearly four-year development effort, Niranjan Nagarajan, associate director of computational and systems biology at the GIS and one of the authors on the paper, told GenomeWeb. In the beginning, he and his colleagues were looking for methods that would allow them combine and analyze multiple datasets as part of a gastric cancer study. While they found several published studies that described integrative analyses of omics datasets, many of these initially analyzed datasets independently and then subsequently combined and ran additional analyses.
That got the GIS team thinking about methods that would enable them "exploit the synergy between these datasets and put them together [earlier on in the analysis process]," he said. The challenge, he said, was to develop a simple computational model that required a few parameters to be learnt and struck the right balance between more complex structured models and unstructured association analysis.
"We thought it was going to be an interesting problem … and cancer genomics was an ideal test bed for such ideas because they were already generating these complex rich datasets as part of projects such as TCGA and ICGC," Nagarajan said. However, there is a general need for tools that let researchers combine and explore different datasets to answer questions about complex biological systems, so it's an area that's bound to grow, he added.
According to the NAR paper, OncoIMPACT is designed to integrate information regarding mutations (genomic and epigenomic), changes in cell state (e.g. transcriptome, proteome, epigenome, or metabolome), and gene interaction networks and to use it to identify and rank cancer driver mutations.
It evaluates the impacts of different mutations by associating them to "modules of ... deregulated genes through the gene interaction network" using a series of parameters, the paper explains. This process also helps distinguish functional driver mutations from mere passenger mutations that show up in cancer genomes. Finally, nominated mutations are ranked based on the impact they have on the modules that they were associated with in the previous step. Predictions are patient specific, meaning that cancer drivers are selected based on the mutations, pathways, interactions, and so on that are found in each unique sample.
The basic underlying idea of the system is that "if you have a mutation which is presumably a driver mutation, it should have an impact on the cellular system that you are looking at," Nagarajan explained. "So you should see the mutation at the genomic level but then you either see a transcriptomic change or an epigenomic change or a proteomic change," and those changes should be in genes involved in similar pathways and networks, he added.
The software is trained using existing omics information for different cancer types to learn a few parameters that define features such as the neighborhood of a gene and what constitutes a perturbed expression profile. Then when it's presented with new datasets, it can use existing profiles it has for the different cancer types and predict which genes are potential drivers and which are not in a sample-specific manner.
OncoIMPACT addresses methodological challenges suffered by some existing solutions that have been developed to identify functional mutations in cancer and to distinguish these from passenger mutations.
"Recent studies that have cataloged the frequency of mutations in genes based on a large number of patient samples have been quite successful in identifying the major oncogenes and tumor suppressors in a cancer subtype [but] these approaches are not well-suited for identifying rare drivers or patient-specific driver genes even with the use of more sophisticated statistical approaches," the researchers wrote.
Some other approaches try to infer functional mutations from evolutionary conservation and physiochemical information, but these are "restricted to point mutations and were found to lack in accuracy due to a dependence on high-quality training data," the GIS team wrote.
A third category of tools, which use reconstructed interaction networks based on gene co-expression data or molecular networks, can suggest "biologically plausible" candidate driver mutations and work for different sorts of mutations, but are "currently limited to making aggregate predictions for a data set and are not designed to support the sample-specific analysis that would be key for defining personalized cancer management and therapy," the researchers wrote. There are also no currently available methods that can robustly analyze data from cancer cell lines, which often serve as in vitro models for pharmacological investigations, theynoted.
The results of comparison studies described in the paper that pitted OncoIMPACT against some of the aforementioned approaches highlight some of the improvements that the GIS tool offers. One experiment, for example, compared OncoIMPACT's performance on a TCGA dataset of over 300 samples each of glioblastoma and ovarian cancers to results provided by an aggregate network-based method and a naive mutation frequency-based approach.
The researchers reported that OncoIMPACT successfully used information on copy numbers, point mutations, and indels to highlight key driver genes. Meanwhile, the frequency-based approach highlighted lesser cancer driver genes; and the results from the aggregate network-based approach, although more on par with OncoIMPACT's findings, still failed to note several known oncogenes that were in the test datasets, according to the paper.
Another experiment described in the study looked at OncoIMPACT's performance on cell line datasets. Tumor cell lines can differ from the initial parent tumor as a result of adaptations to culturing conditions and may have mutational frequencies that do not reflect those found in fresher samples. Moreover, since cell lines don't have normal controls, efforts to identify somatic variants from these samples are prone to error.
For their analysis, the GIS researchers looked at 47 ovarian and 41 glioma cell lines downloaded from the CCLE data portal. They reported that in spite of the challenges with analyzing cell line data, OncoIMPACT returned results that highlighted known genetic drivers performing "significantly better" than a competing approach.
The source code and executables for OncoIMPACT are freely available from sourceforge. The current iteration of the OncoIMPACT package includes databases for all five cancer types that the researchers have explored so far. The developers plan to include additional databases containing information from other cancer subtypes studied as part of the TCGA, and are also looking into datasets from the ICGC projects. They're also partnering with investigators at the National Cancer Center of Singapore to use OncoIMPACT to analyze patient samples and identify genetic drivers that could be potential targets for new treatments, Nagarajan said.
The team is also looking into integrating other kinds of omic data which could help improve OncoIMPACT's predictions. For the NAR study, they focused on integrating genome and transcriptome data, but noted that the method should be able to work with other kinds of molecular data such as phospho-proteomic, microRNA and methylation profiling data.