ARLINGTON, Va.--Bioinformatics experts who attended a meeting of the Object Management Group here last month told BioInform that the Life Sciences Research domain task force is several steps closer to issuing its first set of CORBA standards for sequence analysis software and is gaining momentum toward creating standards in several other areas that would make bioinformatics tools interoperable. "A lot of the questions that pharmaceutical companies may have had about whether the Life Sciences Research group is going to get anything accomplished are going to be answered soon," said Tim Clark of Millennium Pharmaceuticals.
Clark explained the drive behind making software from different vendors interoperable: "American Home Products has licensed our technology, including software, and they want to know, now that they've made a deal with NetGenics, can our software interoperate. They want to be able to create best-of-breed suites of applications and not put all their eggs in one basket. That's very reasonable." Clark added, "I think the answer is going to be positive."
On sequence analysis specifications, eight bioinformatics organizations that submitted five separate proposals for a standard at the task force's last meeting in November are now collaborating to write one standard. In addition to Millennium and NetGenics, collaborators are Concept-5, the European Bioinformatics Institute, Genome Informatics, Molecular Applications Group, Neomorphic Software, and Oxford Molecular. The standard, which members expected to present at a meeting of the Object Management Group in Tokyo this spring, will be the first to be issued by the Life Sciences Research domain task force. "It means a lot to us," said Eric Neumann of NetGenics. "We know it's a central piece for a lot of further activities." He added, "We're going to come up with something that is very forward-looking that a lot of groups can use readily for many types of sequence analyses."
Neumann said there is little disincentive for competing companies to collaborate on the standard. "Everyone is contributing their best skills, because the nature of CORBA is that they don't have to worry about internal representations or implementations; it's just about getting the pieces working together."
The specifications will address two main components of sequence analysis software--analysis machinery and bio objects, Steve Chervitz of Neomorphic told BioInform.
"We've had a lot of productive discussions and have agreed on sections of the bio objects interfaces, but there's more to go," he said. "It's a bit more involved than analysis machinery."
Analysis machinery specs, he explained, "will concern how you run an analysis that takes input parameters and generates output data." On this component, Chervitz said the bioinformatics group is working with a group outside of the Life Sciences Research domain called CORBA-Med.
"They're more closely tied to the healthcare industry and are working on medical applications, diagnostics, and clinical work as opposed to bioinformatics, but have similar needs in terms of analysis machinery," Chervitz told BioInform. Ideally, he said, the two groups could conceive of one analysis machinery specification that could be used for all those applications. That could facilitate future integration of sequence analysis data with clinical and diagnostic data, as well as avoid overlapping standards. "The Object Management Group doesn't like to have several different standards that do essentially the same thing," Chervitz remarked.
Cheminformatics and macromolecular structures
Working groups developing CORBA standards for several other types of life sciences-related technology also reported progress after the meeting here.
The Object Management Group approved a request for information that will be distributed by the working group on macromolecular structures, and a cheminformatics working group "got a shot in the arm," according to Chervitz. Although the group has been generating discussion for some time, "more companies expressed strong interest in cheminformatics at this meeting," Chervitz said.
Clark added, "Previous meetings had this rotating representation of one cheminformatics vendor and maybe one or two users. This meeting had 18 people in the cheminformatics working group and it really got off the ground." Representation from MDL Information Systems and Tripos especially impressed participants in the working group. "Seeing MDL's president tell all those users that his company wanted to do CORBA interface was very important," Clark said. The group said it would release a request for information at the group's next meeting in Philadelphia in March.
Neumann contended that macromolecular structure could be a bridge between bioinformatics and cheminformatics technology. "It seems that the small-molecule issues and the indexability of cheminformatics bridges over when you talk about protein structures and how drugs bind to proteins and have their effect," he said.
Another point at which cheminformatics could be merged with bioinformatics data is gene expression, he said. A possibility would be to "develop assays that are integral to cheminformatics which may bridge over to gene expression so that you can have an expression profile and activity relations all merged together."
No one has an exact formula for how bioinformatics and cheminformatics will be integrated, Neumann said, "but everyone knows that because the groups already do work together at some level, the informatics has to be integrated."
Genetic map specifications and gene expression data
Neumann also reported "strong activity" in response to a request for proposals on genetic map specifications that was released in November, and said that a working group on gene expression analysis has been collecting responses to its request for information on microarray data usage.
Greg Miller, manager of bioinformatics systems development at Ariad Pharmaceuticals said the gene expression working group agreed here to extend the deadline for responses to its request for information in order to "engineer some increased publicity for the subject area."
"Essentially, the Object Management Group puts these things on their web page (http://www.omg.org/homepages/lsr) and then relies on those who are casually interested to notice or to be informed by those who care more," Miller said. "We're trying to collect information from the community at large about what's going on in gene expression, what people are looking for, and what requirements might be for any standard proposed." He continued, "It's very important to get as much input as we can, not just from people involved in the group but from those who are involved in the subject area and don't necessarily care about the technology."
The gene expression information request seeks input about the current state of the art in microarray gene expression systems; existing standardization efforts in the field, such as the GATC standard that Affymetrix and Molecular Dynamics are promulgating; current architecture used for gene expression systems; software components that would be composed together to create such systems; and the potential interfaces between these components. Said Miller, "The request for information is the basis for the Object Management Group standard adoption process, but it is also an important way for the community to get together and see what is the current state of the art." He added, "This technology is so new that it's not as if we can do a literature search to find out everything that is going on in the area."
What motivates Miller's interest in a gene expression data standard is simple: "The Hoechst-Ariad Genomics Center placed a very large investment in large-scale, high-throughput gene expression profiling as one of our primary research technologies for identifying targets of interest," he explained. "Our goal is to do this more systematically and as thoroughly as anybody out there." Starting the ball rolling now toward interoperability among all the pieces of software and hardware that go into a gene expression analysis system will "make it easier for us to analyze all the data that are so important to us," Miller said.