The European Bioinformatics Institute has decided to support DoubleTwists genomic annotation XML format, AGAVE, as part of its XEMBL project an effort to make EMBL nucleotide sequence data available to users in a variety of alternative formats.
Alan Robinson of the EBI Industry Program said that the XEMBL project (www.ebi.ac.uk/xembl) arose from user demand for EMBL nucleotide and protein sequence data in XML format. Rather than invent yet another XML format for sequence data, we decided to first try the currently available specifications, Robinson said. XEMBL also supports BSML, an XML format released as an open standard by LabBook.
Robinson said that the project, largely the work of the EBIs Jean-Jack Riethoven, is also a means to explore different mechanisms of providing XML data, such as web forms, CGI scripts, and a SOAP [simple object access protocol] server. Using someone elses specification also meant that we didnt have to develop tools to use the XML, so this is a huge resource-saving for us, Robinson said.
DoubleTwist made AGAVE (Architecture for Genomic Annotation, Visualization, and Exchange) freely available to the life science community in July. Sun Microsystems, the Weizmann Institute of Science, and BioTools have also adopted the standard. The company originally developed AGAVE to build its annotated human genome database and its Prophecy product suite, which includes the database, a query system, and a graphical annotation viewer.
DoubleTwist also released several new data transformation tools and an AGAVE Java library at www.agavexml.org. The AGAVE Java library is a Java object model corresponding to the AGAVE document type definition with methods for parsing and manipulating AGAVE XML.
The two XML standards supported by XEMBL address different needs: While BSML focuses on sequence information, AGAVE is more centered on genomic annotations. Output from publicly available databases can be converted to XML and used in programs that support the AGAVE and BSML formats. While applications that support these standards are so far limited to the DoubleTwist and LabBook viewers, Robinson expects to see a number of new programs appear soon, particularly toolkits that are able to import and export AGAVE and BSML submitted to the BioPerl and BioJava projects. Having the XML tools packaged in these open source projects will put AGAVE and BSML capabilities on thousands of desktops as people update their libraries, said Robinson.
Despite the fact that the XEMBL server has yet to be announced publicly, people have already found it and have been quietly evaluating it, Robinson said. He added that the EBI has also had discussions with groups who would like to use the XEMBL server in a production environment.
But while enthusiasm for the project is high, Robinson warned that XML and SOAP are still relatively new technologies and may still prove to be ëthe Emperors new clothes to our community. While leading IT companies such as IBM, Microsoft, and Sun are backing XML-based technologies and the Interoperable Informatics Infrastructure Consortium is seeking to base its future standards on XML, Robinson said he remains cautiously optimistic about the technology.
Many people are pinning their hopes on XML and SOAP as a technology that will help the integration of bioinformatics applications Ö However, unless theres agreement in our own community on standards for the XML format and SOAP protocol, well still have anarchy. This is where the I3C and OMG may provide a vital role in coordinating efforts and reaching consensus.
Edward Kiruluta, chief technology officer of DoubleTwist a participating company in the I3C said he was delighted to have the EBIs support of the standard. These partnerships help to make AGAVE an even more robust platform that can enhance collaboration throughout the life sciences community.
Yet Robinson said hes doubtful that a single over-riding technology will emerge to solve the industrys data integration problems. Im agnostic to the technology used, he said. Its the data thats important, not the technology used to access it.