Cancer researchers faced with an increasing variety of omics data sets must often combine several analysis packages into a complex workflow, a situation that has hindered the promise of systems biology.
In an effort to address this challenge, researchers at the British Columbia Cancer Agency Research Centre and the University of British Columbia have developed an “integrative” software tool that allows users to visualize and analyze genome, epigenome, and transcriptomes data with a single interface.
The software, called SIGMA2, builds on the System for Integrative Genomic Microarray Analysis, or SIGMA, tool that the developers created in 2006 to analyze array comparative genomic hybridization data, but has been expanded to integrate the analysis of several different data types. This integrative approach, the developers note in a recent paper in BMC Bioinformatics describing the tool, is expected to facilitate “high-throughput systems biology analysis of cancer.”
User-friendliness was an important focus of development, Raj Chari, first author on the paper and a PhD student at BCCA, told BioInform.
Chari said that many of his colleagues at BCCA are not routine users of bioinformatics or biostatistics tools such as R. “We put these things in the program so that they are hidden,” he said. “Everything has nice radio buttons and windows.”
Integrative software tools are particularly needed in cancer research, Chari and his colleagues note in the paper, since cancer involves multiple types of cellular disruption. Those changes include genetic alterations such as mutations, changes in gene dosage, allele imbalance, and epigenetic alterations such as changes in DNA methylation and histone-modification states.
Since one or several of these factors can impact a given gene, and because individual tumors may show the same phenotype caused by various patterns of disruptive events, a multi-dimensional approach can help identify the causal events at the DNA level and understand their downstream consequences, the authors explained.
Public and Private
There are a number of commercial and academic software packages that analyze a single type of omics data, such as gene expression or CGH data, but the options for cross-platform analysis are limited, the researchers said.
“Though different software can analyze data generated from different platforms, the ability to perform meta-analysis using data from multiple microarray platforms is limited to a small number of software packages,” they wrote. “Consequently, integrative analysis of cancer genomes typically involves no more than two types of data, most commonly the integration of gene dosage and gene-expression data and recently expanded to integrating allelic information.”
According to the BMC Bioinformatics paper, SIGMA2 lets users integrate genomics, epigenomics, and transcriptomics data from various platforms so they can combine the different assay results, such as when they have identified changed genes in terms of copy number variation, loss of heterozygosity, DNA methylation, or expression.
However, SIGMA2 isn’t the only academic integrative tool. SNPExpress, for example, lets scientists combine gene expression, DNA copy number, and genotype number.
“Now people are more understanding [that] we need to merge data from different platforms, so that we can better utilize the data.”
Peter Valk, head of molecular diagnostics at the Institute of Haematology at the Dutch Erasmus Medical Center, developed SNPExpress with colleagues at the Dana Farber Cancer Institute and the Broad Institute of MIT and Harvard. He told BioInform in an e-mail that SNPExpress offers the many data-visualization options from various omics platforms that SIGMA2 includes.
“In fact, we changed SNPExpress such that it is much more flexible in visualizing omics data,” he said. “However, the analysis tools in SIGMA2 make [it] a more comprehensive tool to study these types of data.”
Soheil Shams, CEO of BioDiscovery, said that his firm’s Nexus CGH is also an integrated tool, and can combine expression, copy number, gene, and miRNA data.
Version 4 of Nexus Copy Number, scheduled for release soon, will also support methylation and ChIP-on-chip data integration, Shams told BioInform via e-mail.
In the paper, Chari and colleagues include a table that compares SIGMA2 with several other software packages, including Nexus CGH. According to the comparison, SIGMA2 offers a number of features that are not available in any other resource, such as consensus calling with multiple segmentation algorithms; multi-dimensional visualization of genetic, epigenetic, and gene-expression data; and links to external gene expression data repositories.
But Shams noted that the table excluded many features that are unique to Nexus, such as clustering based on genomic-copy number profiles and aggregate profiles based on sample phenotype.
Shams also criticized the use of public funding to support an expanding set of publicly available software tools that are, in his opinion, not of the same quality as commercial tools. “Creating a software tool should be for commercial companies that do a better job and can support the products long term,” he said.
SIGMA2 is an expansion of the SIGMA resource that Chari and his colleagues developed in 2006 to help researchers mine and analyze array CGH data. “It was essentially a database application,” he said, explaining he and his colleagues developed it to manage their own wealth of CGH profile data on cancer cell lines.
SIGMA includes high-resolution whole-genome array CGH profiles of 200 cancer cell lines profiled on four different platforms. On top of this database, the scientists created a web interface for users to comparatively analyze multiple genomes.
With SIGMA2 Chari and his colleagues tried to capture the usability features of SIGMA and its interface, he said. “A lot of the look and feel was taken from that version and what we have done in the second version is add a lot more functionality to it.”
With the new tool, he said, scientists can import their data and “do more rigorous analysis.”
Some of the software development from SIGMA transferred to SIGMA2, such as the code for the MySQL connection and many elements of the user interface. “There was about 20 to 25 percent we could definitely reuse,” he said.
According to the researchers, SIGMA2 offers features that enable researchers to analyze microarray data handling, integrate copy-number data, align and analyze genetic and epigenetic data, and visualize genome features.
“Now people are more understanding that we need to merge data from different platforms, so that we can better utilize the data,” he said. “That is where the spirit for the development of Sigma2 came about.”
One feature the scientists highlight in the paper is the ability of the software to let users “take a consensus of multiple algorithms” using “and/or” logic.
Users can compare the differences between two genomic subtypes of lung or breast cancer by statistically analyzing and visualizing those results. “That is where you would be using a two-group comparison,” Chari said. “In terms of doing a multi-group comparison, moving beyond two groups would be something we would consider adding on in the future,” he said.
One challenge in multi-platform analysis is file format conversion. SIGMA2 doesn’t do away with the need for some data pre-processing. For example, Affymetrix SNP array data needs to be pre-processed and normalized before importing it into the software.
“It’s a very difficult problem, because everyone has different file formats. In terms of [being] a one-stop shop, it is getting there, but it will take more time,” Chari said.
SIGMA2 was developed for Windows, but Chari said he is working on porting it to Linux and Mac OS. The software is currently closed-source, “but if people are interested we will definitely talk to them,” he said.
Ron Lauener, scientific development officer in the BC Cancer Agency’s Technology Development Office, told BioInform that the software is freely available through an academic software license agreement.
“We want to make this advanced piece of software freely available to anybody and everybody who wants to use it, but there are certain restrictions,” he said. Namely, the software is available “for non-profit research or educational purposes only,” and it cannot be modified, decompiled, sublicensed, or sold to third parties.
If a commercial entity is interested in using the software for commercial purposes, they could possibly enter into a different and commercial license agreement with the BCCA, he said.