NEW YORK, Sept. 6 - A working group trying to tame the bewildering variety of microarray data formats has created a new markup language designed to make it easier for researchers to transmit and share data.
That new data format, called Microarray Gene Expression Markup Language, or MAGE-ML, creates a syntax that can manage the enormous number of variables involved in microarray experiments, and provides a mutually intelligible format to permit data merges or comparisons.
Designed to be used with both commercial and spotted arrays, the language can describe microarray design, manufacturing information, experiment setup, data, and analysis results. A paper describing the format appears in Genome Biology.
MAGE-ML working group chair Paul Spellman, who is a postdoctoral fellow in the University of California at Berkeley's Department of Cellular and Molecular Biology, said the model so far had "substantial support" from institutions like the European Bioinformatics Institute and Affymetrix. The test, he said, will be in how well it gets adopted,--"and how much hate mail I get."
"We're trying to write a specification that covers data from platforms that may not have even been invented yet, trying to think as broadly as possible yet trying to make it concrete," said Spellman. "If we overspecify, we're going to have problems. There's also no way to model all the biology that is going on. You can't hope to model the true complexity of biological experimentation."
The working group, a collaborative effort, includes representatives from Lion Bioscience, The Institute for Genomic Research, Rosetta Biosoftware, and the Institute for Systems Biology, among others.
The markup language project is an offshoot of the Microarray Gene Expression Data society, an ad hoc committee of microarray researchers that is struggling to impose order and create standards for the proliferating forms of array data.
Last spring, the group proposed a new "minimal information" standard that specifies what annotation information must be included with any microarray experiment, and is now trying to get that standard widely adopted.
MAGE-ML should make it easier for microarray researchers to adopt that proposed system, said Spellman.
"It's building momentum, and in a year-and-a-half we can say to journals: This is approved, you'll need to require people to submit data to be archived" in this form.
More importantly, it can allow researchers to combine data from multiple experiments run under different conditions in different labs, said Spellman. "The hope is that it will make transporting data between places A and B that don't speak the same language possible."
For further information, see the MAGE web site.