Skip to main content
Premium Trial:

Request an Annual Quote

EBI s XEMBL Project Adds DoubleTwist s AGAVE to its Alternative Data Formats


The European Bioinformatics Institute has decided to support DoubleTwist’s genomic annotation XML format, AGAVE, as part of its XEMBL project — an effort to make EMBL nucleotide sequence data available to users in a variety of alternative formats.

Alan Robinson of the EBI Industry Program said that the XEMBL project ( arose from user demand for EMBL nucleotide and protein sequence data in XML format. “Rather than invent yet another XML format for sequence data, we decided to first try the currently available specifications,” Robinson said. XEMBL also supports BSML, an XML format released as an open standard by LabBook.

Robinson said that the project, largely the work of the EBI’s Jean-Jack Riethoven, is also a means to explore different mechanisms of providing XML data, such as web forms, CGI scripts, and a SOAP [simple object access protocol] server. “Using someone else’s specification also meant that we didn’t have to develop tools to use the XML, so this is a huge resource-saving for us,” Robinson said.

DoubleTwist made AGAVE (Architecture for Genomic Annotation, Visualization, and Exchange) freely available to the life science community in July. Sun Microsystems, the Weizmann Institute of Science, and BioTools have also adopted the standard. The company originally developed AGAVE to build its annotated human genome database and its Prophecy product suite, which includes the database, a query system, and a graphical annotation viewer.

DoubleTwist also released several new data transformation tools and an AGAVE Java library at The AGAVE Java library is a Java object model corresponding to the AGAVE document type definition with methods for parsing and manipulating AGAVE XML.

The two XML standards supported by XEMBL address different needs: While BSML focuses on sequence information, AGAVE is more centered on genomic annotations. Output from publicly available databases can be converted to XML and used in programs that support the AGAVE and BSML formats. While applications that support these standards are so far limited to the DoubleTwist and LabBook viewers, Robinson expects to see a number of new programs appear soon, particularly toolkits that are able to import and export AGAVE and BSML submitted to the BioPerl and BioJava projects. “Having the XML tools packaged in these open source projects will put AGAVE and BSML capabilities on thousands of desktops as people update their libraries,” said Robinson.

Despite the fact that the XEMBL server has yet to be announced publicly, “people have already found it and have been quietly evaluating it,” Robinson said. He added that the EBI has also had discussions with groups who would like to use the XEMBL server in a production environment.

But while enthusiasm for the project is high, Robinson warned that XML and SOAP are still relatively new technologies and “may still prove to be ëthe Emperor’s new clothes’ to our community.” While leading IT companies such as IBM, Microsoft, and Sun are backing XML-based technologies and the Interoperable Informatics Infrastructure Consortium is seeking to base its future standards on XML, Robinson said he remains “cautiously optimistic” about the technology.

“Many people are pinning their hopes on XML and SOAP as a technology that will help the integration of bioinformatics applications Ö However, unless there’s agreement in our own community on standards for the XML format and SOAP protocol, we’ll still have anarchy. This is where the I3C and OMG may provide a vital role in coordinating efforts and reaching consensus.”

Edward Kiruluta, chief technology officer of DoubleTwist — a participating company in the I3C — said he was “delighted” to have the EBI’s support of the standard. “These partnerships help to make AGAVE an even more robust platform that can enhance collaboration throughout the life sciences community.”

Yet Robinson said he’s doubtful that a single over-riding technology will emerge to solve the industry’s data integration problems. “I’m agnostic to the technology used,” he said. “It’s the data that’s important, not the technology used to access it.”

— BT

Filed under

The Scan

US Booster Eligibility Decision

The US CDC director recommends that people at high risk of developing COVID-19 due to their jobs also be eligible for COVID-19 boosters, in addition to those 65 years old and older or with underlying medical conditions.

Arizona Bill Before Judge

The Arizona Daily Star reports that a judge is weighing whether a new Arizona law restricting abortion due to genetic conditions is a ban or a restriction.

Additional Genes

Wales is rolling out new genetic testing service for cancer patients, according to BBC News.

Science Papers Examine State of Human Genomic Research, Single-Cell Protein Quantification

In Science this week: a number of editorials and policy reports discuss advances in human genomic research, and more.