As bioinformatics data continue to expand in complexity, with new multidimensional data types such as tables and images entering the stream, the time-honored approach of quickly writing a Perl program to parse one format into another becomes problematic.
A potential solution has been available since the late 90’s in the form of CORBA (Common Object Request Broker Architecture), a protocol maintained by the Object Management Group, a software standards group.
CORBA potentially could permit data types to interface with one another in a platform independent way, but it has thus far received lukewarm response from the bioinformatics community because of complaints that it is difficult to learn and write, and that OMG’s standards approval process is too cumbersome and time consuming for the fast paced world of bioinformatics.
But as methods and data become increasingly object-like in their presentation, a new look at CORBA may be warranted.
A number of CORBA-based tools for bioinformatics have already appeared. Fabien Campagne, bioinformatics officer at Mount Sinai School of Medicine in New York, has written a CORBA implementation of the venerable Higgins-Sharp CLUSTALW multiple alignment tool.
According to Campagne, the freely downloadable application enables a user to communicate with a CLUSTAL server from within any application such as a text editor. In the past typical implementations of CLUSTAL have been hard-coded into a sequence analysis package, or else run from a UNIX command line.
Another example of CORBA’s growing popularity is last year’s extension to Lion Bioscience’s SRS, published by the European Bioinformatics Institute’s SRS group. It provides access to the data managed by an SRS server via CORBA wrappers that allow client applications such as visualization and data mining tools to access and query SRS servers remotely.
Enabling the move from linear text-based information to objects — information and methods or processes wrapped up together — Jian Hu and colleagues at the Roslin Institute in Edinburgh have published an implementation of a CORBA-based genome mapping system prototype with emphasis on database connectivity and GUI in the Internet environment.
Another CORBA-based map implementation is an interface definition language specification for genome maps, which has been presented to the community by Emmanuel Barillot and colleagues at Infobiogen of Villlejuif, France.
The CORBA IDL has been written and implemented for RHdb radiation hybrid and HuGeMap databases. Barillot said it can be generalized to all genome maps.
Jeremy Parsons of the European Molecular Biology Laboratory and Patricia Rodriguez-Tomé of Cereon Genomics have used CORBA with a different goal, to address the problem of error and redundancy in public EST databases. They claim their CORBA-based JESAM software allows users to generate custom EST-derived databases without repeating the complex intermediate work of others to get a different view of the data.
CORBA has also been put to work on the problem of DNA chromatogram traces. The familiar four-color traces are the primary data source for all large-scale genomic and EST sequencing projects, and many later analyses such as contig assembly and polymorphism detection depend on them.
But obtaining and using traces is difficult because they are not collected and published centrally, and their volume content is much larger than the base calls derived from them. Parsons and colleagues at EBI developed a client/server system based on a Java applet at the client side and facilitated by CORBA to enable the user to look them up in a repository.