John Weinstein's microarray group at the National Cancer Institute, he recounted, was faced with a very big question: "Now that I've done the experimental work, and I've done the statistical work, and I have my list of changed genes, what the hell does that list mean biologically?" In the hope of getting at least a bit closer to the answer, a team of developers from NCI, Georgia Tech, Emory University, and SRA International developed a software tool called GoMiner that organizes genes according to biological function using the Gene Ontology.
The tool, a client-server application available at http://discover.nci.nih.gov/gominer and http://www.miblab.gatech.edu/gominer, was developed with microarray data in mind, but co-developer Barry Zeeberg of the NCI said it works with any high-throughput genomic or proteomic technology. The key to the system is the GoMiner engine, which automatically assigns a gene association and a GO category to each gene in the input gene list.
Before GoMiner was developed, Weinstein said, the only available option was to manually look up every gene of interest one at a time in the literature and other data sources in order to gauge its biological function. GoMiner, on the other hand, categorizes the list of changed genes within the context of every gene on the entire array in a matter of minutes — a process that Zeeberg estimated would take "literally a lifetime" to do manually for some arrays.
The genes are displayed in a tree structure based on the GO hierarchy, as well as in the form of a directed acyclic graph (DAG) — a representation of hierarchical data that permits some categories to have more than one parent. The DAG visualization is programmed using scalable vector graphics, so users can mouse over nodes to view selected genes or click on any node to view its position in multiple pathways.
GoMiner also links each gene in the tree view to its corresponding page in LocusLink, PubMed, MedMiner, GeneCards, KEGG, and BioCarta. Zeeberg said this list would continue to grow.
In addition to the Java-based GUI, a command-line version of GoMiner is also available for developers who would like to use it for higher-throughput experiments or integrate it with other applications. The command-line option, as well as the DAG capability, set GoMiner apart from similar programs such as Onto-Express and MappFinder, according to Zeeberg.
Zeeberg said the development team of about a dozen programmers relied on "agile computing" — an iterative approach that is technically a subset of extreme programming — to write GoMiner in four months. The team plans to continue developing the software over the next nine months or so to hone its usability features as well as its statistical capabilities, and Zeeberg added that the developers welcome any feedback or suggestions for improvement.
A paper describing GoMiner in detail recently appeared in Genome Biology [http://genomebiology.com/2003/4/4/r28], and nearly 1,600 readers had accessed the article as of April 10.