Skip to main content
Premium Trial:

Request an Annual Quote

A Dim Summary of Microarray Software


New microarray software is pouring out of the kitchen like dim sum carts at a Chinese brunch. So many exotic looking morsels — which ones to try? If you’re a novice like me, you can’t even tell fish from fowl, beef from pork, main course from dessert.

That’s OK at brunch, where the worst that could happen is you bite into something you hate. But it’s not so good when you’re picking microarray software, every piece of which takes a lot of time to find, install, and try out.

A year ago in this column I conducted taste-tests of eight microarray products and reviewed six more as a mere observer. In this, my second annual review of microarray software, I’ll describe and list the main ingredients of 41 packages.

Dumplings and chicken’s feet

Some of the products are complete meals with dishes for each of the four courses I defined last year — normalization, filtering, clustering (which I now prefer to call numerical analysis), and biological interpretation — while others are just nibbles that satisfy specific analytical needs. There are desktop products intended for individual diners, and enterprise products that are banquets for the whole company. For the daring gastronome, the spiciest software tools are academic packages that implement the latest and greatest ideas of leading researchers — these can turn an ordinary meal into a special feast.

Most full-course products offer bland and standard bill of fare. For normalization, they offer various scaling options to correct for hot and cold chips, accompanied by a platter of simple mathematical operations — such as taking logarithms, calculating ratios or fold change, and centering — for changing how the data look.

For filtering, they let you discard data whose expression levels or changes are either too low for your appetite or are based on properties like the functional classification of a gene.

For numerical analysis (main course), most products serve profile search (which lets you find expression profiles similar to a given one), clustering (hierarchical and otherwise), and principal components analysis (PCA). Several products also entice you with classification methods such as neural nets or support vector machines. For dessert (biological interpretation), most products provide links to external gene sequence or function databases, or permit users to establish such links.

Statistics is the latest taste sensation. A year ago, only the most avant garde products cooked with statistics, but today most products at least dabble with this spice. Statistical methods are becoming more sophisticated as well, reflecting the tremendous work being done by academic microarray statisticians.

What follows is a lot to digest, but I hope my quick tour of microarray dim sum will help you separate the shu mai from the wontons. You may not get exactly what you want, but at least it will be off the right cart.


Full course commercial dinners for one


Here is the list of full-course, commercial, desktop products. I only mention unusual features of each one, leaving unstated the common features described above.


BioMine, Gene Network Sciences
Unique clustering methods; statistical methods for validating clusters; experiment design tool

BioMiner, MicroDiscovery
Novel normalization methods; support vector machines

GeneLinker, Molecular Mining
Novel mutual nearest neighbors clustering

GeneMaths, Applied Maths
Statistical methods for validating clusters; discriminant analysis

GenePlus, Enodar
Regression techniques to assess significance of expression changes

GeneSight, BioDiscovery
Novel normalization methods to correct dye bias; discriminant analysis

GeneSpring, Silicon Genetics
A market leader that represents the gold standard for this class of product. Discriminant analysis; scripting tools

J-Express, MolMine
Academic version still available from GeneX. Sammon maps; multidimensional scaling

Partek, Partek
A rising statistical heavyweight. More than 20 distance measures for clustering; neural nets, discriminant analysis; multidimensional scaling; scripting tools

Pathways 4, ResGen
Modular architecture for plug-in extensions

Spotfire DecisionSite for Functional Genomics, Spotfire
A market leader that integrates microarray tools with Spotfire’s visualization engine. Formerly called Array Explorer. Offers few novel microarray features per se, but the integration with Spotfire’s visualization tools is delectable. Can cluster on text, ontology classifications, etc., in addition to expression values

Xpression NTI, InforMax
Can filter data by variability, e.g., to eliminate data deemed to be unreliable; novel QT-clustering; Sammon maps


Full course academic dinners for one


Now for the academic meals:


BRB ArrayTools,
Simon Rich, NCI
Excel plug-in; statistical methods for validating clusters; novel classification method; multidimensional scaling

Michael Eisen, LBNL
A market leader that pioneered clustering and other aspects of microarray analysis, its data format is a de facto standard. It has no unique features, because everyone has copied it.

MAExplorer, Laboratory of Experimental and Computational Biology, NCI
Java program that can run as standalone application or applet

TIGR MultipleExperiment Viewer (TMEV) The Institute
for Genomic Research
Java application

XCluster, Gavin Sherlock, Stanford /cluster.html
Another pioneering academic program, similar to Cluster; runs on Unix and Linux


Commercial nibbles


The next group of snacks offer unique features for specific problems. Except as noted, all are desktop products.


ArrayStat, Imaging Research
Robust statistical methods to estimate measurement error

BioinformatiXEngine, Xpogen
Web-based product intended for use on intranet. Novel clustering method based on relevance networks; modular architecture for plug-in extensions

OmniViz Pro, OmniViz
Impressive collection of novel visualization and dimensional reduction methods

Visual Gene, Visipoint
Uses self-organization maps for analysis and visualization, in contrast to most products that use SOMs only for clustering



Commercial banquets


Enterprise products provide an integrated multi-course banquet — consisting of tools and a central database — to feed an entire research organization. Several of the vendors go further and offer a complete meal plan of software tools for other areas of bioinformatics. These products are great if you like the cuisine.


The market leaders in this category are GeneData’s Expressionist ( and Rosetta’s Resolver ( Resolver was one of the first commercial products to cook with statistics. The product calculates error estimates and propagates these through the analysis. The two largest bioinformatics software vendors, Lion Bioscience and InforMax, also have products in this category: ArrayScout ( and GenoMax Gene Expression Module (, respectively. A fascinating new product is GeneTraffic from Iobion Informatics ( GeneTraffic is a network appliance that runs on dedicated, inexpensive Linux computers.


Bring on the spice


The real hot things are academic dishes that push the frontiers of microarray analysis. This software is not for the faint of heart. Some programs are command line utilities, and many others are code libraries or subroutines. A few have Web versions, but usually these are just demos that offer a quick taste. Much of this software is open source, some of which is available from the GeneX project ( at the National Center for Genome Research; GeneX also operates a website where these tools can be tried out.

Several of the programs implement versions of a technique called borrowing power described in the box on page 64.

BCLUST, Hongyu Zhao, Yale University School of Medicine
Statistical method for validating clusters using bootstrapping

CLEAVER (Classification of Expression Arrays) Russ Altman, Stanford University
Web server that provides k-means clustering, discriminant analysis, and PCA

CLICK, Ron Shamir and Roded Sharan, Tel Aviv University
Novel clustering algorithm that uses graph-theoretic and statistical techniques

CLUSFAVOR, Leif Peterson, Baylor
Bayesian methods for normalization; factor analysis (similar to PCA)

CyberT, Tony Long, University of California, Irvine
Part of GeneX. Borrows power and then uses a Bayesian model to assess the significance of expression changes

GEDA: Gene Expression Data Analysis, Christina Kendziorski, University of Wisconsin
A highly referenced program that also has a Web version and can be accessed via email. Borrows power and then uses a Bayesian model to assess the significance of expression changes.

K-means Integrated Models for Oligonucleotide Arrays (Kimono) Ian Holmes, Berkeley Drosophila Genome Project
Jointly clusters promoter sequences and expression profiles to find promoters that regulate various genes

MA-ANOVA programs for microarray data, Gary Churchill, Jackson Laboratory
Implements pioneering ANOVA error model that handles many kinds of measurement errors, Brian Yandell, University of Wisconsin /tr1031.html
Borrows power and uses results to improve measurements of low-abundance transcripts

PaGE, Christian J. Stoeckert,
Penn Center for Bioinformatics, University of Pennsylvania
Borrows power and then computes confidence levels for direction, but not magnitude, of expression change.

Plaid, Laura Lazzeroni and Art Owen, Stanford University
Implements new “fuzzy” clustering method that clusters genes and samples simultaneously. Not open source.

RCluster, Karen Schlauch, National Center for Genome Research rcluster/help.html
Part of GeneX. Implements several standard clustering methods, and statistical method for validating clusters using bootstrapping.

SAM: Significance Analysis of Microarrays, Rob Tibshirani, Stanford University
Excel plug-in that correlates gene expression data with clinical parameters

SMA: Statistics for Microarray Analysis, Terry Speed, University of California, Berkeley zarray/Software/smacode.html
Influential suite of programs, providing basic microarray statistical routines. Also provides normalization functions that correct dye bias and print tip effects.

SVDMAN: Singular Value Decomposition Microarray Analysis, Michael Wall, Los Alamos National Laboratory
Uses singular value decomposition (similar to PCA) to partially cluster genes; also calculates confidence measures for clusters.

VERA: Variability and Error Assessment & SAM:
Significance of Array Measurement, Trey Ideker, Institute for Systems Biology andSAM/?id=yvfw4
A pair of programs for assessing significance of expression changes using statistical error models


Borrowing Power

A pressing issue in microarray statistics is finding ways to increase power without increasing the number of replicates. This reflects the cold reality that microarray experiments are too expensive for statisticians to do as many as they would like.

One approach is to combine data from multiple genes to better estimate the variance of each one. An (overly) simple idea is to assume that all genes are subject to the same amount of uncontrolled variation. Given this assumption, we can combine the measurements for all genes into one large pool, increasing the effective sample size from the number of replicates (a small number like two or three) to the number of replicates multiplied by the number of genes — a large number like 20,000-30,000, which is certainly large enough to make accurate statistical estimates.

The rub, of course, is that the basic assumption is false. The idea can be resurrected by adopting the weaker assumption that all genes of a given expression level show the same variation, or better that expression level is a major component of the variation. This is a promising idea that is being pursued in different forms by many microarray statisticians.

— NG


Nat Goodman, PhD, helped found the Whitehead/MIT Center for Genome Research, directed a bioinformatics group at the Jackson Laboratory, led a bioinformatics marketing team for Compaq Computer, and has been consulting ever since. He is currently a free agent in Seattle. Send your comments to Nat at [email protected]


The Scan

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.

TB Resistance Insights Gleaned From Genome Sequence, Antimicrobial Response Assays

Researchers in PLOS Biology explore M. tuberculosis resistance with a combination of sequencing and assays looking at the minimum inhibitory concentrations of 13 drugs.

Mendelian Disease Genes Prioritized Using Tissue-Specific Expression Clues

Mendelian gene candidates could be flagged for further functional analyses based on tissue-specific transcriptome and proteome profiles, a new Journal of Human Genetics paper says.

Single-Cell Sequencing Points to Embryo Mosaicism

Mosaicism may affect preimplantation genetic tests for aneuploidy, a single-cell sequencing-based analysis of almost three dozen embryos in PLOS Genetics finds.