Ninety-five new web-based genomic data-analysis resources are discussed in a special issue of Nucleic Acids Research published online this week.
NAR's "Web Server" issue is the eleventh such issue devoted to software for the analysis and visualization of molecular biology data. According to the new issue's lead editorial, authored by Executive Editor Gary Benson, the emphasis of the most current issue is on network and pathway analysis, high-throughput sequencing data analysis and biological text mining.
At the same time, some of the tools discussed were developed specifically for the analysis of microarray data. BioArray News spoke this week with the creators of a number of the tools featured in the new issue, including the designers of ArrayAnalysis, a tool for microarray quality control and pre-processing; TMA Navigator, a web application for the analysis of tissue microarray data; PATHiWAYS, a pathway visualization system; Graphite Web, a tool for gene set analysis that exploits pathway topology; and PlantGSEA; a webserver focused on plant organisms.
In each case, the software developers cited an unmet need in array data analysis as motivating them to create and distribute their own free online tools.
A team of bioinformaticians led by the department of bioinformatics at Maastricht University in the Netherlands is behind ArrayAnalysis. In the NAR paper discussing the tool, the authors write that while quality control is "crucial" for any scientific method producing data, adoption of algorithms for QC, pre-processing, and normalization of Affymetrix expression chips, many of which are available through Bioconductor for use with the statistical interface R, has been "hampered by lack of integrative tools that can be used by users of any experience level."
The authors created ArrayAnalysis to "fill this gap" by "extending, integrating, and harmonizing" the functionality of various Bioconductor packages. The portal includes interpretation help and detailed technical documentation with the ultimate aim of "improving data quality and reuse and adoption of standards."
Chris Evelo, head of Maastricht University's bioinformatics department and corresponding author on the NAR paper, said that there are several ideas behind ArrayAnalysis. According to Evelo, many institutes already have the statistical and quality control procedures featured in the tool in place, but they have "never been well integrated" to analysis workflows, and so his team "improved on that" by making data analysis and visualization easier and more consistent.
He noted that the tool is also available to researchers working at institutes that currently lack their own QC pipelines.
"I now and then have had to review array papers and asked for quality control," said Evelo. "Wetlab scientists often complained they didn't have the expertise or the tools," he said. "Now they do."
He added that ArrayAnalysis could enable further analysis of existing array data sets, as well as provide confidence in the quality of that data.
"There is a lot of data, for over a million published arrays, freely available out there," said Evelo. "You could do a lot of interesting stuff with that data if you could check its quality and normalize it in a standardized way."
Evelo said that his team continues to develop ArrayAnalysis, with parallel pipelines for Illumina arrays and two-color arrays, such as Agilent Technologies chips, already in beta, as well as pipelines for chromatin immunoprecipitation and methylated DNA immunoprecipitation array analysis, also in beta testing.
Bioinformaticians at the University of Edinburgh's Institute of Genetics and Molecular Medicine developed a software tool with a very different target user group from ArrayAnalysis' intended audience.
Called TMA Navigator, U of Edinburgh's tool is for scientists running tissue microarrays, which, according to the NAR paper, allow multiplexed analysis of tissue samples and are frequently used to estimate biomarker protein expression in tumor biopsies.
Co-creator Alex Lubbock said that the Human Genetics Unit within IGMM decided to create TMA Navigator following discussions with clinical and wet-lab research colleagues, "based on the need for easy to use yet powerful tools" to analyze TMA data.
"While tools exist for TMA data management and image processing, there was no purpose-built software available for analyzing TMA marker scores [or] protein expression, with the scope our colleagues require," Lubbock said.
Among the capabilities touted in the paper are algorithms for mitigating batch effects and grouping patient samples according to marker scores. TMA Navigator also supports network inference approaches that could "offer insights into the molecular logic underlying pathophenotypes," the authors also claim.
Lubbock said that his collaborators have found the network interference tool in particular to be "useful … for elucidating marker relationships within the specific context of a patient cohort of interest."
The "major strength" of the TMA Navigator approach is that it allows "hypothesis-driven and exploratory analysis of TMA data in a non-biased, data-led manner," Lubbock noted. "Users need no specialist computational or statistical knowledge to use the service and the user guide provides guidance with interpretation," he said.
Lubbock said that his team is eager to introduce TMA Navigator to clinicians and researchers, particularly in the pathology and oncology communities. He said that several labs are already regular users of the tool. He also said that his team plans to develop TMA Navigator further.
"Our lab is particularly interested in network biology, so we're exploring ways to add to the existing network analysis options, for example with automated edge thresholding approaches and possibilities for producing directed marker networks," he said.
A number of papers in the new Web Server issue were related to NAR's theme of network and pathway analysis. One tool highlighted was PATHiWAYS, which was developed in the lab of Joaquín Dopazo in the department of computational genomics at the Centro de Investigación Príncipe Felipe in Valencia, Spain.
As discussed in the paper, though there are web tools available that allow viewing and editing pathways, "few methods aim to identify the signaling circuits within a pathway" and "none of them provide a convenient graphical web interface."
To tackle this issue, Dopazo's lab created PATHiWAYS, a web-based signaling pathway visualization system that infers changes in signaling that affect cell functionality from the measurements of gene-expression values in typical expression microarray case-control experiments.
"What inspired me to create this tool was the frustrating poor information that we currently extract from the raw gene expression level measurements and its connection to the real cell functionality," Dopazo said. "The PATHiWAYS algorithm transforms the abstract, and context-less, parameter of gene expression level into a meaningful value: the probability of signal transmission in a pathway or, in other words, the capability of the cell to trigger a functionality as response to a particular stimulus."
Patricia Sebastián-León, lead author on the NAR paper and a scientist in Dopazo's lab, said that the tool's developers plan to expand it to cover new species and microarray platforms. "In addition, we are working on interactive visualization of the results," she said. According to Dopazo, other planned upgrades include supporting different types of pathways, such as metabolic pathways, or more abstract protein interaction graphs, such as the interactome.
The University of Padua's GraphiteWeb is another tool for pathway analyses and network visualization of gene expression data. However, unlike CIPF's PATHiWAYS tool, GraphiteWeb allows the analysis of both microarray and RNA-seq data. According to the paper describing the tool, GraphiteWeb implements five different gene set analyses on three model organisms and two pathway databases.
"One of the main difficulties we see in transcriptomic projects is the interpretation of the huge amount of results that classical statistical methods provide," said lead author Gabriele Sales, a staff scientist in U of Padua's department of biology. "We believe that biological pathways — manually curated collections of reactions among cellular elements — provide a powerful guide in interpreting such results," he said.
Over the past few years, Sales' lab has developed a computational framework called Graphite to provide a uniform view over the data stored in different pathway databases. It also developed a methodological approach to analyze transcriptome data that makes use of pathway topology to gain statistical power.
According to Sales, GraphiteWeb combines these two veins of research with visualization tools. "Data visualization is extremely challenging, but we believe it represents a fundamental aspect of how a software application helps the users to interpret biological results," he said.
Encouraged by the large number of users in the first month of operation, Sales said his lab is already working on expanding the number of organisms and on adding direct links to the publications which were used to annotate the single interactions inside the pathways. The lab also plans to include non-coding RNAs in pathway annotations. Additionally, Sales said the lab is investigating a new visualization layout that "looks less like a generic network and more similar to the textbook representation of a pathway."
Researchers at China Agricultural University in Beijing have expanded on two gene ontology tools that provide annotations for defining gene sets, EasyGO, and agriGo, with the launch of PlantGSEA.
According to the NAR paper, PlantGSEA is based on the Broad Institute's gene set enrichment analysis methodology, which determines whether an a priori defined set of genes shows statistically significant, concordant differences between two phenotypes.
Lead author Gabriel Yi said that CAU's PlantGSEA focuses on plant organisms and relies on 20,290 defined gene sets derived from different resources. Currently, gene locus IDs and Affymetrix microarray probe set IDs from four plant model species — Arabidopsis thaliana, Oryza sativa, Zea mays, and Gossypium raimondii — are included.
"The lack of an integrative gene functional analysis toolkit in the plant community drove us to develop this tool," Yi said this week. "We believe it will bring our users a reliable solution to mine biological facts underlying the high-throughput data since our gene annotations are heavily literature-based."
He added that his lab is developing tools for use with ChIP, DNase, and Global-Run-On sequencing data.