Scientists at The Institute for Genomic Research (TIGR) are working to bring their microarray database up to the standards set by the Microarray Gene Expression Database Group (MGED), a community-wide effort initiated at the European Bioinformatics Institute (EBI).
This standard, the Minimum Information about a Microarray Experiment (or in this acronym-crazy bioinformatics subculture, MIAME), is intended to ensure that microarray data can be standardized and exchanged between scientists in different labs. The standard requires that labs provide details about experimental and array design; samples; sample prep and labeling; hybridization conditions; measurements of array data; and controls.
While the TIGR group covered most of these areas when it first set up its database, the database director, Joseph White, noticed that many areas were lacking when he returned from the last Microarray Gene Expression Database (MGED) meeting last spring. So the lab sought to fill these gaps not an easy task for an already established database, and one that it is still completing.
We track a fair amount of the information required for MIAME, but there are things right up until now that we have not captured, White said.
TIGRs effort to bring its database in compliance with the MIAME standard is among the first in what researchers at MGED hope will become a widespread movement to standardize microarray databases in accordance with the list of information in MIAME.
What we encourage people to do is take a look at the listing, see which of the items are relevant to them, and look to see whether thats information that theyre capturing in their databases, said Chris Stoeckert, a University of Pennsylvania genetics researcher who chairs the ontologies working group for MGED. Thats the approach that the TIGR people have taken.
TIGRs database, which it calls MAD (for microarray database) already required researchers to input information about the experiment at every stage, from wellplate, slide, and spot, to analysis and normalization. But one of the major areas that the TIGR group needed to address in its database was that of establishing a controlled vocabulary. This vocabulary includes names of terms included in the database, values associated with each functional term used in the database, as well as sources for the terms.
For example, on the analysis end if you use a statistical term like a tail, the name of the term in this controlled vocabulary would be tail, the value would be long, short, or moderate, and the source would be the database where you have obtained the distribution, said White. Or if a person inputs information about a glass slide, there are five different values, or types of glass slides in the database. The idea is that the user selects one value from the controlled vocabulary.
Another area of microarray data processing that MIAME seeks to standardize is that of sample description. There are very well-defined ontologies for sample description in [medicine], the heart, the aorta, the brain structures, said White. For microarrays, these sample sources are captured with a name and a value that is consistent.
But Whites group has run into a significant obstacle: the MGED ontology working group is still in the process of defining the standard vocabulary for microarrays.
The group is planning to finalize this vocabulary along with software for easily inputting the information into databases at the fourth MGED conference, which is slated to take place in Boston between February 13th and 16th, 2002.
We recognize that there are two things that need to be done next, said Stoeckert. One is, given the set of guidelines, to create a set of criteria for people to use to see whether or not theyve met these guidelines. The second is to create annotation forms that allow people to fill in the information to be captured for the MIAME guidelines.
White, Alvis Brazma of EBI, and others, are developing a program that will allow the bench scientist to simply input their information into standardized web page-based forms, which will then translate into a standard microarray database written in MAML or MAGE ML. (Microarray Gene Expression Markup language), the microarray markup language. In this way, the researchers hope to develop something like GenBank for microarray experiments.
A group of developers is meeting at the European Bioinformatics Institute in Hinxton, UK, this December to write the software that will enable researchers to load data onto MIAME-compliant databases, according to White.
While each lab will have to work this new software into its own database, the forms will serve as templates that it can use for this purpose, said Stoeckert. The MGED group has not yet figured out how it will make these templates available whether on the MGED website (www.mged.org) or elsewhere, but will decide this by the next MGED conference.
Meanwhile, the ontology working group has submitted a paper to Nature Genetics which it expects to be published soon, and which will flesh out the ideas and thinking behind the guidelines in the MIAME standard.
This paper, along with the ready-made forms, may provide a needed boost to acronym-phobic microarray researchers who have been so far reluctant to address the issue of MIAME compliance.
As many labs have well-oiled laboratory systems they are reluctant to change, others see the minimum amount of information required under the standard as far beyond the minimum they need for their research purposes.
With MIAME, a lot of people think its maximal, White noted. But its not minimal, its minimal acceptable information. You can provide less information, but then it will be harder to interpret the results of your experiment.
The MIAME Standard: A Summary
The minimum information about a published microarray based gene expression experiment should include the description of
1. Experimental design: the set of the hybridization experiments as a whole
2. Array design: each array used and each element (spot) on the array
3. Samples: samples used, the extract preparation and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantitation, specifications
6. Controls: types, values, specifications
For a more detailed list of the MIAME standard, see the Microarray Gene Expression Database website, www.mged.org.