For bioinformatics researchers at the Center for Genome Research at Bielefeld University in Germany, currently available open source methods for data integration were lacking an important element: While systems such as BioMoby, Isys, and MyGrid provide access to multiple data sources, researchers still had to transfer data in and out of other systems in order to perform their analysis. “Most of these other [integration] systems can be used to gather a lot of information from different sources, so you can get annotations, for example, but it is not possible with these systems alone to run a complete genome annotation,” said Alexander Goesmann, a bioinformaticist at Bielefeld University.
Rather than cobble together a patchwork of in-house and outside systems, Goesmann and his colleagues created a single framework that they called BRIDGE (Bioinformatics Resource for the Integration of Heterogeneous Data from Genomic Explorations), which includes an integration layer as well as several application components that plug into the system and communicate with one another.
The modules include GenDB, a genome annotation system for prokaryotic genomes; EMMA, a MIAME-compliant repository for gene expression data and experimental information; ProDB, a repository for proteomics data similar to EMMA; and GOPArc, a system for analyzing and visualizing Gene Ontology and pathway data.
The components can be used in concert to streamline analytical processes that might otherwise require users to transfer data between a number of databases and applications. For example, the GenDB, EMMA, and GOPArc modules can be used to automatically map up- or down-regulated genes identified in EMMA onto the annotated genes in GenDB as well as the KEGG metabolic pathways in GOPArc.
The integration piece is an extension of O2DBI, an object-oriented layer previously developed by Jörn Clausen at Bielefeld University to map data objects to a relational database. O2DBI automatically generates code for accessing and manipulating the objects, but is only responsible for one relational database at a time, so BRIDGE acts as an additional layer that links multiple relational databases, Goesmann said.
The framework also includes a user management system, which allows administrators to define appropriate access privileges for users, as well as a general project management system that tracks specific projects, which can involve one or more of the core components, so that end-users don’t have to know where the data they are using for a particular project comes from.
Goesmann said that BRIDGE was developed with an eye toward systems biology research, and the framework was built so that new components could be plugged in as necessary. New components on the drawing board include a module for displaying genome comparison data, as well as a component for visualizing stress-related data.
Goesmann said the group is also considering adding new pathway data sets to the GOPArc module, which currently only uses KEGG.
One drawback of the system, Goesmann said, is that it is limited to prokaryotic genomes, primarily because the GenDB system — which is the framework’s main storage component for genomic data — was originally set up for prokaryotes and has not yet been updated to include introns, exons, and other more complex features of eukaryotic genomes. But this problem can be solved by extending the model, Goesmann noted, which “isn’t a big deal.” He said that he and his colleagues expect to release a new version of BRIDGE by the end of the year for eukaryotic genomes, but warned that its functionality will be limited because gene prediction methods for eukaryotes are not as effective as for prokaryotes.
Goesmann said the Bielefeld team’s goal is to make the BRIDGE system widely available, but they are still working on ways to improve the installation process, which currently “takes some experience and manpower.” The system is currently installed at the Max-Planck Institute, and Goesmann and his colleagues plan to have a simplified version of the open source system available by June that it will offer to non-commercial users for free, and to commercial users under a licensing agreement.
A paper describing the BRIDGE system in more detail appeared in the December 19 issue of the Journal of Biotechnology [106 (2003) 157-167].