The way IBM has been promoting federated information access, you might think it’s a new technology that will solve all your information integration problems. But federated access isn’t new, and in many cases it may not be the best solution to your information integration challenges.
Vendors, including Oracle, have been providing federated information access solutions for years. These solutions let you read or update data in multiple distributed data stores as easily as if the data were loaded into a single database. Data is accessed in real time and most solutions provide data location transparency. Oracle Distributed SQL, a feature of the Oracle database for over 10 years, can transparently integrate multiple data stores, including other databases, flat files, and Web services.
In life sciences discovery, federated information access queries commonly span a range of data types such as XML, images, text, and Web pages. As the analytical capabilities of databases expand, users are incorporating the results of data mining, text mining and statistics into these queries.
Federated access is invaluable for ad hoc queries. It is also an excellent tool for querying large amounts of data that are infrequently accessed, as well as for queries on data that you have permission to access but not move.
However, federated access is only part of the information integration story. “Consolidation” and “information sharing” are alternative technologies that can be used to integrate data. You should understand and weigh the issues associated with each before choosing a solution that meets your needs.
Consolidation involves moving your distributed data into a single database and managing them in this central location. This can require substantial planning and upfront effort. But because all data are in the local database, performance is enhanced — no need to query remote databases over a slow network. And, with only a single database to manage, there is great potential to reduce your costs. Another benefit is that data are more readily available to users because access is not dependent on the availability of multiple databases. But consolidation won’t be practical if you don’t own all the data stores, or if your environment includes legacy applications that require certain data stores.
Information sharing may provide the best of both the federated and consolidated approaches. Data are kept local to owners, and can also be consolidated for efficient access and autonomous operations. Data can be shared using methods such as message queuing and replication. Availability is enhanced because data required by an application are local to the application, and multiple copies of data protect against data errors, corruptions, and disasters.
Oracle Streams is an information-sharing feature that provides for automatic maintenance of data across multiple distributed databases and applications. You could use this to set up an IT architecture that automatically captures new versions of public data sources such as EMBL, identifies the incremental updates since the last version was brought in-house, and applies those changes. Some organizations have started using information-sharing technologies to build bioanalytical pipelines for data preparation and analytical functions.
All of these information-access technologies play critical roles in the IT infrastructure of life sciences organizations. While not new, federated information access is an important solution. But don’t let the federated computing hype lead you to ignore alternative solutions that can bring automation to the complex world of information integration in discovery.
Opposite Strand is a forum for readers to express opinions and ideas about trends and issues in genomics. Submissions should be kept to 550 words and may be e-mailed to [email protected]