New MAGE-ML Paper Offers Biologist-Friendly View of Microarray Data Exchange Format


A recent paper in Genome Biology describing the MAGE-ML microarray gene expression markup language should help communicate the benefits of the format to biologists who are not familiar with the nuts and bolts of software engineering, according to lead author Paul Spellman of the University of California, Berkeley.

The MAGE specification, which was approved by the Object Management Group’s Life Sciences Research Domain Task Force more than six months ago, is currently detailed in an 80-page OMG document that’s “extremely verbose” and simply “not readable,” Spellman said. The Genome Biology paper highlights the key aspects of the format of interest to microarray users, with the hope that they can “get their IT people to start looking at it and decide whether to support it,” Spellman said.

The paper, available at research/0046, outlines the MAGE-OM object model, the MAGE-ML XML representation of MAGE-OM, and the MAGE Software Toolkit (MAGE-STK). MAGE-OM contains 132 classes grouped into 17 packages that reflect many of the core requirements of the MIAME standard. MAGE-ML translates MAGE-OM into a data format to facilitate the exchange of microarray data, while MAGE-STK is a suite of APIs that can be used to export data to MAGE-ML, store data in a relational database, or as input to software-analysis tools. Implementations in Perl and Java are currently available at MAGE/magestk.html.

Spellman said the publication of the paper is something of a milestone for the project, but the authors have several other near-term goals ahead, including the final vote on the specification by the OMG, expected to occur at the group’s next meeting in early September, which would give the go-ahead for a MAGE v1.0 release.

Adoption of the standard has been steady — Rosetta Biosoftware, Affymetrix, the EBI, the NCI, TIGR, and other organizations have already implemented MAGE-compliant software in production settings, and other groups are coming on board. Iobion CSO Jason Goncalves said the company plans to implement MAGE-ML export into GeneTraffic by late fall, while Molecular Mining is “eager to implement” the format, according to Don Van Dyke, vice president of sales and marketing, “but right now we’re looking for users to show us how they want it used.”

Indeed, usage requirements are the key bottleneck right now to pushing the project ahead. While people understand that standardization of microarray data is an important issue, many are “still not sure how they may adopt [standards],” said Spellman. Adding MAGE-ML export to a microarray database is a big step toward MIAME compliance and should take a competent programmer only about a week, he said, but many challenges remain. For example, he noted, the current specification offers limited support for cluster analysis, so MAGE developers are working on a way to add support for different clustering and analysis methods as well as the exchange of conclusions based on microarray data. “It’s not just the raw data, it’s what you did with it that’s important to researchers,” said Spellman.

For bioinformatics vendors like Molecular Mining who focus on the data mining aspects of microarray experiments, this is a key issue in determining when and how to support MAGE in future versions of its software, Van Dyke said.

For Iobion, which provides a database as part of its offering, data export in MAGE-ML is more of a near-term goal, said Goncalves. “A year ago, when we launched GeneTraffic, which supported MIAME, initially nobody asked for MIAME support,” he said, but “now you hear more questions about whether it’s MIAME supportive … I think the same will be true for MAGE-ML.” In particular, Goncalves noted, the capability should be of interest to researchers who are required by funding agencies to make their microarray data publicly available. “A key limiting factor to that is whether or not you’re actually going to be able to successfully exchange that data,” he said.

