Microarray data standards, long in the making, have finally reached the doorsteps of scientific journals: Late last month, both the Nature research journals and The Lancet endorsed guidelines issued by the Microarray Gene Expression Data society. The Nature journals even went a step further, requiring authors to submit their microarray data to a public database.
The journals responded to a recent open letter by the MGED society, which strongly urged them to use a set of guidelines and a checklist based on the Minimal Information About a Microarray Experiment (MIAME) standard — published last year in Nature Genetics — in their decisions to accept microarray papers for publication. Their reply was quick: “We sent this open letter to the journals just a few weeks ago, and The Lancet called me back the next day,” Alvis Brazma, head of the European Bioinformatics Institute’s microarray informatics group and a member of MGED, told BioArray News at a conference in Boston last week. Cell has also expressed an interest, he added.
But the journals had been considering this issue for some time. Nature had already declared in April 2001 that it was watching the evolution of microarray standards closely. After receiving the open letter, the editors solicited comment from two prominent microarray groups not affiliated with MGED and started discussions with MGED members, including groups at TIGR and EBI, and the curators of both ArrayExpress and GEO, as well as with other Nature research journals editors, said Chris Gunter, associate editor for genetics and genomics at Nature, in an e-mail interview. “All felt that the checklist proposed was a reasonable one,” she said, and that the time for adopting standards had come. Starting Dec. 1, all submissions of microarray experiments to Nature and its sister journals must include information compliant with the MIAME standard.
The Lancet came to a similar conclusion: “We had been aware of the MIAME guidelines since they were published in Nature Genetics, but felt that until now they were not in a particularly useful format for our authors, reviewers, and readers,” said Virginia Barbour, the journal’s molecular medicine editor, in an e-mail interview. “The checklist [by the MGED society] …makes it much clearer what is required,” she said. The Lancet now encourages authors to comply with these guidelines, but doesn’t consider them as strict rules, according to Barbour. Furthermore, it is considering asking authors to submit microarray data to a public database at the moment.
The Nature journals have already taken this step: Data integral to a paper’s conclusions need to be submitted to the EBI’s ArrayExpress or the NCBI’s Gene Expression Omnibus, rather than being made available on the author’s own website. “We felt, and were assured by the curators, that the databases were finally ready for us to request such a step,” said Gunter. Another resason for this change: Some authors had abused their control over their websites by tracking the anonymous reviewers who visited them.
Other journals have not decided yet how to respond to the MGED society’s letter. Science is planning to publish the letter and ask for feedback from its readers, according to Becky Ham, a spokeswoman for the journal. “We do support the goals of the society but feel that at this point, discussion of standards among the researchers is still necessary and the way to go,” she said. At present, Science does not require microarray data to be submitted to a public database.
Genome Research has been listing microarray databases in its instructions for authors for over a year now but has not decided yet whether to adopt the MIAME standards. “I would probably never require them but I would be likely to request that any array paper indicated how many of the MIAME rules it followed,” said Laurie Goldman, executive editor of Genome Research. The main reason for making public databases mandatory, she said, is that “individual sites disappear, and one would hope that a government-funded site won’t.”
As of now, it is unclear how much the new journal rules will impact traffic to the public databases. If other journals follow suit, “perhaps load will increase twofold from now, which would be great for us,” said Alex Lash, who runs the NCBI’s GEO. Two years into its operation, GEO hosts more than 2,500 datasets. EBI’s ArrayExpress, launched earlier this year, contains nine gene expression experiments, but it is gearing up for more submissions: “Currently, we are developing our internal pipeline,” said Ugis Sarkans, database development coordinator in EBI’s microarray informatics team. Five curators are involved in preparing data for loading, he said.
But what is also important, Ugis thinks, is how software vendors will support MAGE-ML, or microarray gene expression markup language, a data exchange format that allows files to be loaded into ArrayExpress. According to Brazma, MAGE-ML was expected to be adopted officially by the Object Management Group, at its recent meeting in Helsinki. Several companies and institutes, he said, are currently testing MAGE-ML pipelines to ArrayExpress, among them Affymetrix, Lund University for BASE, MolMine for J-Express, the Sanger Institute for Microarray Data Analysis System (MIDAS), and TIGR for Microarray Data Manager (MADAM). Others have expressed an interest in establishing MAGE-ML functionalities, he said, among them Agilent, Iobion for GeneTraffic, Manchester University for maxd, Rosetta for Resolver, and Stanford University for the Stanford Microarray Database. In addition, to help researchers prepare their datasets for ArrayExpress, the EBI recently put the first fully functional version of MIAMExpress on its website, “a tool which asks you MIAME relevant questions and exports MAGE-ML files,” said Brazma.
But not all providers are going down the MAGE-ML road. Silicon Genetics now provides fields to fill in data required by the MIAME recommendations in its latest version of GeneSpring (5.0) that came out about two weeks ago, as well as the GeNet 3.0 database. However, it has no plans to implement MAGE-ML functionalities, mainly out of space concerns.
“Microarray data is notoriously large anyway, and by using XML to store everything, you end up multiplying the space issues,” said Jordan Stockton, the company’s associate product manager.
GEO also pursues a “more minimalist approach,” thanArrayExpress, said Lash. Most of the information that is entered is in a text format, not in a field format that ArrayExpress tends toward, he said. This, he said, keeps the approach flexible, and allows the inclusion of information not covered by MIAME guidelines.
In the end, MIAME may be implemented in different ways. MIAME is a recommendation, not a standard, said Stockton. “It’s difficult to say that you absolutely do [or] absolutely don’t comply,” he said. “I suspect that many people will be implementing a subset of the MIAME recommendations.”