Faced with the limitations of commercial gene expression analysis software, Robert Stuart of the University of California, San Diego, took matters into his own hands. “We found that currently available tools didn’t do what we wanted it to, so I had to write my own,” said Stuart, a nephrologist and assistant professor of medicine at UCSD.
“We found that you couldn’t get very far into the data analysis without making your own.”
The programs, Equalizer and eBlot, were central in a recent experiment led by UCSD’s Sanjay Nigam to identify 873 rat kidney genes that play a key role in development out of 8,740 genes present on Affymetrix GeneChips. The research was published in the May 1 issue of the Proceedings of the National Academy of Sciences.
The UCSD researchers were able to cluster the genes into five distinct groups, based on their peak expression during development. The result is a step-by-step view of the development of a mammalian kidney, from the embryonic stage to its adult role.
“The gene clusters we ultimately arrived at were utterly understandable in biological terms,” Stuart said. “In other words, we got what we considered to be a right answer. Almost every gene that’s ever been described as important in some way in kidney development, we found.”
Equalizer is a microarray data normalization tool, while eBlot is a database application that links curated functional data in the Gene Ontology Consortium database and source library information in dbEST with sequences present on the chips.
Stuart found that normalization was a particular stumbling block with commercial gene expression analysis software. One program the research team considered “was flipping all negative expression values to positive without telling us,” Stuart said.
Equalizer takes a non-linear approach to normalization. For any pairwise comparison, genes are ranked in both gene lists in order of signal intensity and then resorted. Those genes that have the same or similar rank in both lists comprise a subset of genes that trace the central tendency of the data, which is used as the basis for straightening out the rest of the data. This “data equalization” approach, which uses a stereoequalizer-type “slider” to change the values in a given range, gave the program its name.
The eBlot database was developed to link the genes that are available on Affy’s arrays with publicly available databases in order to provide complete annotation on gene function, cellular location, and other information. “The problem is there’s no common nomenclature for gene orthologs across species, across databases, so you can’t link up information easily,” Stuart said.
Stuart used Blast comparisons to link the Affy target sequences to the dbEST and GO databases. Links to other data sources are planned for eBlot in the future and Stuart is also working on a metacomparison method to link information from different species.
Equalizer and eBlot, in addition to a still-unnamed statistical method for generating p-values for microarray data, will soon be freely available for academic users at http://organogenesis.ucsd.edu.