The title of this article has been changed to reflect the fact that the software was primarily developed at Harvard University and not the Broad as the previous title suggested
NEW YORK (GenomeWeb) – Researchers from the Broad Institute, Harvard University, and collaborators at other institutions have published a paper in Nature Methods that describes the most recent incarnation of StratomeX, visualization software developed and used to explore data from The Cancer Genome Atlas study. The software includes new capabilities that could help researchers better characterize subtypes of cancer and other diseases, and ultimately lead to more targeted therapy development and better outcomes for patients.
This most recent version of StratomeX builds on an earlier one that was first developed in 2011 and published in a Computer Graphics Forum paper in 2012. That paper described StratomeX as a visualization tool designed to help investigators integrate and explore "the relationships of candidate [cancer] subtypes across multiple genomic data types such as gene expression, DNA methylation, or copy number data" with an eye towards assessing the effects of these subtypes "on molecular pathways or outcomes such as patient survival."
Now, according to the Nature Methods piece, the improved version of StratomeX "integrates a computational framework for query-based guided exploration of [patient stratifications] directly into the visualization" and this addition enables the "discovery of novel relationships between patient sets and efficient generation and refinement of hypotheses about tumor subtypes." It includes detailed instructions for defining queries; for example, users can ask it to highlight similar stratifications or a pathway that is enriched for a specific patient group, and uses multiple algorithms to rank matching datasets and stratifications provided in response to queries.
Essentially, the new query engine enables researchers to cluster information in new ways that could help them identify new and potentially interesting relationships in their data for further study, Nils Gehlenborg, a research associate in Peter Park's laboratory at HMS and a co-author on the Nature Methods paper, told BioInform.
Prior to this, researchers needed to have some idea of what they were hoping to find in the data when they used StratomeX to group patients, he said. With the new query engine, "we can not only look at something that we think is interesting, but we can also identify potentially interesting stratifications of the patients based on [the] existing stratifications that we are already looking at," he explained. For example, a user analyzing multiple clusters of patients stratified based on their gene expression data can ask the software to run a query for common copy number changes that could be correlated with the activation or deletion of a tumor suppressor gene.
The developers have also linked this new version of StratomeX to the Firehose analysis pipeline — which was used to analyze data generated by the TCGA. The system includes a bit of code that researchers can use to pull the analysis results from any of the more than 30 different tumor types studied as part of the project, Gehlenborg said. And the data upload process has been automated so users don't have to download the datasets and then re-upload them for viewing in StratomeX.
Full details of the updates to StratomeX are available on the software's website, as well as in the supplementary section of Nature Methods, which includes a video with a visual summary of how the software works and features a case study that researchers did to demonstrate the efficacy of their software's ability to characterize tumor subtypes. For the study, which is also described in detail in the supplement, Gehlenborg and his colleagues explored molecular and clinical data from a cohort of over 400 clear cell, renal cell carcinoma cases from the TCGA consortium.
One of the goals of their analysis was to show that with StratomeX that "it is possible to essentially reproduce some of the results that were reported in the original publication that described that subproject of TCGA," he said.
An earlier study from the TCGA consortium had found a large overlap in clusters of mRNA expression data and microRNA expression data, and so the StratomeX team set out to see if they could duplicate those findings. With the mRNA expression data as the starting point, they asked the software to find overlapping clusters of data in the TCGA database. The known microRNA clusters were one of the top results returned in response to the query, according to the researchers.
Another query the researchers ran as part of the study asked StratomeX to identify and rank genetic mutations that adversely affect patient survival. The software identified BAP1 as one of the top genes in this category, a fact borne out by another published TCGA study.
A second goal of the study focused on demonstrating how the software could help generate reliable and testable hypotheses and observations from data. For this part of the analysis, the researchers asked StratomeX to look at protein, gene, and microRNA expression, as well as DNA methylation patterns and identify a cluster of patients with the worst survival outcome. The software found a cluster of protein expression from a subset of patients that fit the bill, according to the paper.
To check the reality of the finding, the researchers then turned to mRNA expression levels of the patients in this group and using gene set enrichment analysis identified several affected pathways related to DNA repair, Gehlenborg said. Other characteristics of patients in this group included higher mutation rates and a loss of the BRCA2 gene, which is also involved in DNA repair. The researchers looked at clinical information from these patients and found that most had stage 3 or stage 4 cancer which, they noted, could account for the low survival rates.
Although this tool was developed in the oncology domain, StratomeX isn't limited to cancer but can be applied to other diseases, the developers said. Planned improvements for future versions of the software include adding tools that provide a more fine-grained view of the genome, enabling users to explore genetic mutations in greater detail, and also making it available as a web-based tool, Gehlenborg said. They also plan to add more detailed visualization capabilities to the software.