HINXTON, UK--At least nine public research institutions that generate gene expression information using DNA microarray technology have agreed to begin submitting their results to a new public archive. The European Bioinformatics Institute (EBI) here, an outstation of the European Molecular Biology Laboratory in Heidelberg, Germany, will build the database, which it intends to unveil later this year.
Alan Robinson, who will direct the project with EBI colleagues Alvis Brazma and Jaak Vilo, told BioInform the planned repository of detailed gene expression profiles is already being considered one of EBI's most significant products, on the scale of the structural and protein databases, such as Swiss-Prot, for which the institute is already known. According to EBI, the expression database will facilitate the cross-validation of data obtained by different technologies and enable the community to characterize various techniques, error rates, benchmarks, and gold standards.
The need for such a centralized bank is born of recent technological breakthroughs in DNA microarray and chip technology that are enabling laboratories to monitor
gene expression on a genome-wide scale. Robinson said that as gene expression data have been proliferated, they have been scattered throughout the research community in a variety of formats, making analysis "incredibly frustrating." Many researchers currently exchange microarray analysis results via e-mail, Robinson said, adding, "imagine how difficult the world would be if that's what we had done with sequence data--if it was available from 1,000 different websites with each lab doing its own data. We'd never be having the revolution in biotech that we are now."
"We are about to have a data explosion," added Brazma, "and this is the time to start thinking about standards, otherwise it will be out of hand."
Genomics scientists worldwide seem to agree. Since outsiders first learned of EBI's plan several months ago, the response has been staggering, Robinson said. Brazma, who is coordinating an invitation-only meeting of potential participants remarked, "You wouldn't believe the number of e-mails we get. We're finding that everybody wants to be involved."
Nine laboratories have already committed to submitting gene expression data gathered from human, mouse, rat, C. elegans, yeast, Drosophila, and E. coli genomes. They include: the European Molecular Biology Laboratory; Sanger Centre; the UK's Medical Resource Council; Germany's cancer research institute, DKFZ; Stanford University; University of California, San Francisco; Massachusetts Institute of Technology; the US National Cancer Institute; and the US National Human Genome Research Institute. Robinson said he expects other organizations to sign on in coming weeks.
Support is also anticipated from 21 pharmaceutical companies that will lend financial assistance to the project by way of EBI's Industry Programme. Said Robinson, "Some companies have already said they will contribute [data] as well. It will give them the opportunity to test their own technology by creating a gold standard." Robinson and Brazma will apply for additional financial assistance from the European Union.
Funding aside, computational obstacles add up to the biggest challenge the project faces. Robinson explained that the technology required to run a gene expression database is somewhat different from a DNA sequence database. "With a sequence database, people might be interested in one, five, 10, or maybe 20 sequences. But in terms of microarrays, you're dealing with 1,000, 6,000, or 10,000 [items] at a time," he said, adding, "Web technology is no good for that."
CORBA will likely form the skeleton of the database, Robinson said, "so a computer program could speak to our gene expression database directly using standards proposed through the Object Management Group and the Life Sciences Research Task Force." CORBA-based standards for sharing and analyzing gene expression data are already under development by the Object Management Group, and EBI standards will be agreed upon by contributing laboratories. Ultimately, said Robinson, analysis tools that conform to Life Sciences Research specifications will plug into EBI's database. He contended, "We are not trying to foist a database upon anybody. We are trying to get groups of people to put it together."
About 10 EBI staff will collaborate to design complementary software tools, but Robinson and Brazma said they hope the project will stimulate development of applications by others, too. "The EBI will provide some tools that will allow you to search and query the database, visualize it, datamine it, what have you," Robinson remarked. "And then," he continued, "I would hope because we will have open standards as regards CORBA, people could use our database and search against it or, if need be, actually grab the database and bring it to their place. Because of standards, they'd be able to use their own tools in-house."
"We feel we have a responsibility to do something," Robinson said, acknowledging, "With the ground changing around you all the time, it won't be an easy job."