EBI s Alvis Brazma Discusses the Standardization of Microarray Data



PhD Moscow State University, 1988, Computer Science, Postdoctoral research at New Mexico State University, 1991-1992

Associate professor and researcher at University of Latvia, Riga, since 1994. Visiting researcher at University of Helsinki, Finland, 1995-1997

Currently serves as microarray infor-matics team leader at the European Bioinformatics Institute, in the European Molecular Biology Laboratory Outstation in Cambridge, UK.

Founding member of the Microarray Gene Expression Database working group (MGED), and head of the group’s annotation committee, which drafted the Minimum Information about a Microarray Experiment (MIAME) standard.

QYou just published a paper in the December 2001 issue of Nature Genetics on the MIAME standard for microarray data. The paper outlines a long list of information that should be included in every microarray experiment. How do you propose to disseminate this standard and get researchers to adhere to it?

AFirst, we are developing good software to support these standards. Eventually we will be encouraging journals and funding agencies to ensure adherence to this standard. But I hope the second step won’t be necessary.

QI understand that EBI is planning a microarray database. When will it be launched?

AThe plan is to try to launch the database by MGED 4, MGED’s annual conference which will take place in Boston in February. We will have an operational web-based data submission tool, a rudimentary data query so people can see what’s in the database, and some good data in the database. We still haven’t gotten the hardware delivery, and if doesn’t come by January it will be difficult to test whether the software works [before MGED.]

QIs this database MIAME-compliant?

A I would rather call it MIAME-supportive. The database can sort all the data with all the detail and the structure of MIAME, but it will not impose any requirements for the data to be MIAME-compliant.

QAre the other major gene expression databases at places such as Sanger and the National Center for Biotechnology Information, MIAME-compliant?

AWe are working with Sanger, which has a MIAME questionnaire in-house. People are required to enter data into their own database in a MIAME-compliant way. We are also collaborating with NCBI but we have slightly different goals. We are trying to make a more flexible data archive that will have more data than required by MIAME. They are also talking about importing MIAME-compliant files. Our tasks are different. We are thinking about how different datasets can be linked together, while as a data archive, they are more concerned with the flexibility of the database.

QHow long will it take before your database is in full operation?

AWe will be operational to the extent that people can query data fully in about six months from now. But it will probably take six to twelve months more before we are fully operational.

QHow will MGED interact with microarray hardware and software manufacturers in the effort to get data to be MIAME-compliant?

ABoth hardware and software manufacturers have been supportive. Rosetta Inpharmatics was one of the initiators of MGED. We will expect the hardware manufacturers to post their array designs in our database. They generally support this, but some have confidential information they won’t want to reveal about the design.

QFor example, the sequence of the oligonucleotides?

AUnder MIAME, we are not requiring the full sequence. It will make our life easier and the data more valuable if we have it. But it will be enough if Affymetrix reveals what features apply to what genes.

QOne of the most difficult areas in microarray technology now seems to be analysis. How will MIAME help sort out this tangle?

AWe need high-quality, reliable raw data to do higher-level mining. In proof-of-principle experiments we can show that genes cluster together, but in second step research, when we want to map out gene networks, we see there is too much noise, so what we have to focus on is to encourage people to provide enough information about their experiments to help in this second-step analysis.

