Skip to main content
Premium Trial:

Request an Annual Quote

Computational Methods Help Find New Patterns in Cancer Omics Data


SAN FRANCISCO (GenomeWeb) – In order to move past the idea of one tumor mutation correlating with one targeted therapy, researchers are turning toward computational tools to help make sense of the vast amounts of omics data and identify pathways and previously overlooked networks that correlate with drug response.

Sourav Bandyopadhyay, an assistant professor of bioengineering and therapeutic sciences at the University of California, San Francisco, recently described one such approach in a presentation at the BioData World West conference in San Francisco, California last month and in a follow-up interview. Specifically, his team developed a method to identify and score mutational networks associated with breast cancer and relevant therapies for those networks.

Similar work is being done by researchers at the Oregon Health Sciences University, where Laura Heiser and colleagues are looking to develop algorithms that can match common gene signature pathways and phenotypes across large cohorts and heterogeneous datasets.

Bandyopadhyay's team developed modular analysis of genomic networks in cancer (MAGNETIC) after struggling to interpret experiments testing drugs on both cancer cell lines and actual tumors, work that the team described in a publication on the BioRxiv server last year.

Cell lines are great for screening drugs, but the results don't always correlate with actual results in patients. "It's not clear what biology is equivalent when you compare cancer cell lines and actual tumors," Bandyopadhyay said. Instead, he wanted a way to use genomic information that took into account the inherent differences between cell lines and actual tumors.

To tackle this problem, Bandyopadhyay's team turned to computational tools and machine learning, focusing first on breast cancer. The researchers developed their method using molecular data from more than 900 breast cancer patient samples that were profiled as part of the Cancer Genome Atlas.

Ultimately, after integrating the data and constructing pathways looking at how the various features interacted, they were able to condense it down to 219 sets of genes within a pathway or network, which they called modules.

They also developed a method to score an individual sample based on which modules are more or less active. The algorithm can process data on somatic mutations, copy number, gene expression, methylation, and some proteomic data, Bandyopadhyay said. The algorithm then figures out which modules are most active.

The modules reflected both known and novel pathways associated with breast cancer. For instance, the researchers identified modules associated with TP53 as well as ER status.

Interestingly, they identified one module that was enriched for genes whose promoters had a specific methylation mark, H3K27me3, which has an unknown role in breast cancer, but is typically associated with repressing genes involved in development and differentiation.

The group also identified modules associated with tumor microenvironment, for instance one that was enriched for immune system genes, as well as a modules associated with non-tumor cell types, like stromal cells.

The H3K27me3 module included around 200 genes. Bandyopadhyay said that the team followed up by doing ChIP-seq to further investigate the chromatin mark and found that its presence or absence in those 200 genes was the main reason why a given gene had high or low expression. For one of the genes, EZH2, he said, there are inhibitors being developed. Other enzymes that mediate the H3K27me3 mark could also be potential drug targets, he said.

The next step, would then be to test that in cell lines to see whether the drugs are effective in the cell lines that have a high activity score for the H3K27me3 module.

The team is now working to develop a "therapeutic map of cancer" — conducting drug screens on cell lines. For effective therapies, they then look to see which modules are the most active. The idea is then that the module activity will serve as an indicator for choosing therapies, Bandyopadhyay said.

Bandyopadhyay added that another next step is to use MAGNETIC to identify modules related to other cancers and to see whether they are shared or different among different cancer types. His team has collections of around 12 different tumor types consisting of hundreds of samples each.

In addition, he said, the UCSF team is looking to use the strategy on high-risk patients who have exhausted standard-of-care therapies to see if they can identify a potential drug that hadn't previously been tried.

Bandyopadhyay's group is also interested in using MAGNETIC to study tumor heterogeneity, and would like to incorporate single-cell RNA-seq data into the model.

Meanwhile, at OHSU, Heiser and her colleagues are looking to develop computational methods to identify gene expression signatures in cancer.

"It's a particularly exciting time," Heiser said in a recent interview. "There's just an incredible amount of data available for us to think about mining. Now, the big bottleneck is trying to understand that data and extract biological meaning."

Heiser's team is focused taking a network biology approach to look at changes in DNA, RNA, and protein and identify patterns that may help understand a particular phenotype.

On the one hand this involves developing new algorithms, which she is doing in collaboration with Mehmet Gonen, an assistant professor of biomedical engineering at OHSU.

Similar to Bandyopadhyay's work on the MAGNETIC tool, Heiser is developing algorithms that are agnostic with regards to the data type — whether gene expression data, protein data, or sequence data.

In addition, she said, some of the signatures that have been identified in her lab work will be used in the context of a clinical trial — Serial Measurements of Architecture and Theranostics (SMART) — that will focus on "trying to make sense of patients' molecular profiles." Specifically, she said the trial will focus on RNA-seq data from from breast, prostate, and pancreatic cancer, as well as acute myeloid leukemia, and will look for around 30 transcriptional signatures that indicate potential treatment with an approved drug. The trial will be led by Joe Gray and Raymond Bergan.

Heiser said that machine learning approaches for data analysis have some advantages over scientists sifting through data to look for signatures and patterns. For one, as increasingly large and complex datasets are generated, analysis becomes more time-consuming and computational approaches can help distill and prioritize the data, she said. In addition, machine learning can "uncover novel patterns that may point to new aspects of the biology that we've not yet considered," she said. "It removes some of the unconscious habit we have of gravitating toward known biology."