Over the past few years, the Intelligent Systems for Molecular Biology conference has grown from a small, informal gathering of a few hundred researchers to a major, if not the major, event in the bioinformatics industry. Organizers for ISMB 2000, which was held at the University of California, San Diego, capped registration at 1,200 people after realizing they would not be able to accommodate all the demand.
ISMBs popularity underscores the rapid growth of the bioinformatics industry and it also highlights another trend: The explosive increase in the amount of genomics data being generated.
In response to the boom, many people at this years ISMB focused on discussing ways for standardizing tools and terms. By doing so researchers hope to reduce redundant efforts to solve problems that affect the entire industry as well as to encourage bioinformaticists to speak the same language.
As one attendee put it, We have gone from accumulating data individually to trying to figure out how we can share tools and information.
Researchers from academic institutions to big pharma used ISMB, as well as the Bioinformatics Open Source Conference that preceded it, and the Bio-Ontologies Conference that was tacked on at the end to discuss the various coordination efforts that are underway.
At BOSC speakers representing the various Bio projects discussed developments in open source code. Two of the most interesting developments are BioCorba, a middleware interface that can be used to get applications in languages such as BioPerl and BioJava to talk to one another, and BioXML, which can assist researchers in their efforts to transmit genomic information over the Web.
Jason Stajich, a programming analyst at the Center for Human Genetics at Duke University Medical Center, said that his BioCorba program allows researchers to expand the limits of individual programming languages.
You cant take a Perl program and have some Java code and just plop it in, said Stajich. Corba allows you to treat it like it doesnt matter which language youre talking about. So you can write your program in one language and then use Corba to talk to objects [written] in a different language.
Using an object-oriented format, Stajich explained that BioCorba describes the functions of objects in such a way that a BioCorba server can retrieve them if they are scripted in BioJava, BioPerl, or BioPython, the languages most commonly used in bioinformatics.
So, for instance, lets say you want to run a BioPerl program but your database is running in Python, BioCorba will automatically turn a Python sequence object into a Perl sequence object, allowing you to use the Perl program without having to rewrite database code.
Right now Stajich said he is not sure how much memory a computer will need in order to support BioCorba, but he is planning to run some experiments to test the programs limitations.
On another front, Brad Marshall of the Berkeley Drosophila Genome Project is developing BioXML. This extension to XML is designed to standardize codes for biological data that researchers are likely to transmit over the Internet.
Lets say John Whoever has a server hes running in Perl or Java and he has a sequence he wants to give to someone over the Internet. He can now dump the contents into an XML format, Marshall said. And BioXML is coming up with a good set of tags for biological data.
Next on Marshalls list is a plan to develop XML parsers for BioPerl, BioJava, and BioPython that can be applied to BioXML.
WHATS IN A NAME?
While some bioinform-aticists are concerned about making sure that everyone is able to script programs in common languages or at least have the right tools do quick translations, others are trying to standardize the definitions of the objects being studied.
When you refer to the eye of the Drosophila, do you mean just the surface or the entire eye? asked Michael Ashburner, research coordinator at the EMBL-European Bioinformatics Institute and a coordinator of the Gene Ontology project.
Such a question points to the two main problems bioinformaticists face when they try to communicate with one another: On the one hand, people could use one word to mean two different things, and on the other hand, people could use two different words for the same thing.
Two years ago at ISMB in Montreal, a few dozen people began to ponder the ontologies conundrum. And this year about 100 people stayed on for an extra day to attend the Bio-Ontologies conference.
Sponsored by SmithKline Beecham, the conferences ultimate goal is to create standard guidelines for the bioinformatics industry.
Everybodys got their way of talking about a gene and protein and metabolic pathways and all these other things, said Robin McEntire, principal computing scientist at SmithKline Beecham. What we need is a consistent vocabulary so that were all talking the same language.
McEntire said that more uniformity would allow companies to better leverage the available data rather than waste time and resources on decoding information.
We want everybody to use the same one [ontology] because then what we can start doing is buying information, McEntire said. The competitive advantage is taking that information and doing something smarter than the next guy with it.
Although no concrete steps were taken during the conference, McEntire said the strong interest in the ontology project was encouraging.
As a result, he said the organizers resolved to meet again in November and start planning for the creation of ontologies for different life science subdivisions, such as genetic regulation and bio-pathways.
Eric Neumann, vice president of life science informatics at 3rd Millennium, and Vincent Shachter, head of bioinformatics research at Hybrigenics in France, are already working on a separate ontology to standardize definitions for the gene regulatory and metabolic networks.
Yet disagreements also abound within the Bio-Ontologies Consortium, indicating that the road to an ordered bioinformatics world will be filled with obstacles.
McEntire said that supporters of the bio-ontology project must still make critical decisions regarding how they will gain commercial acceptance for the ontologies, which tools will be used to develop the ontologies, as well as how contributors will submit their suggestions for review.
After proving that a case for bio-ontologies exists, organizers are now faced with the challenge of getting down to business.
As McEntire conceded, I think theres a lot of interest but we now need to be more concrete.