Skip to main content
Premium Trial:

Request an Annual Quote

EBI s XEMBL Project Adds DoubleTwist s AGAVE to its Alternative Data Formats


The European Bioinformatics Institute has decided to support DoubleTwist’s genomic annotation XML format, AGAVE, as part of its XEMBL project — an effort to make EMBL nucleotide sequence data available to users in a variety of alternative formats.

Alan Robinson of the EBI Industry Program said that the XEMBL project ( arose from user demand for EMBL nucleotide and protein sequence data in XML format. “Rather than invent yet another XML format for sequence data, we decided to first try the currently available specifications,” Robinson said. XEMBL also supports BSML, an XML format released as an open standard by LabBook.

Robinson said that the project, largely the work of the EBI’s Jean-Jack Riethoven, is also a means to explore different mechanisms of providing XML data, such as web forms, CGI scripts, and a SOAP [simple object access protocol] server. “Using someone else’s specification also meant that we didn’t have to develop tools to use the XML, so this is a huge resource-saving for us,” Robinson said.

DoubleTwist made AGAVE (Architecture for Genomic Annotation, Visualization, and Exchange) freely available to the life science community in July. Sun Microsystems, the Weizmann Institute of Science, and BioTools have also adopted the standard. The company originally developed AGAVE to build its annotated human genome database and its Prophecy product suite, which includes the database, a query system, and a graphical annotation viewer.

DoubleTwist also released several new data transformation tools and an AGAVE Java library at The AGAVE Java library is a Java object model corresponding to the AGAVE document type definition with methods for parsing and manipulating AGAVE XML.

The two XML standards supported by XEMBL address different needs: While BSML focuses on sequence information, AGAVE is more centered on genomic annotations. Output from publicly available databases can be converted to XML and used in programs that support the AGAVE and BSML formats. While applications that support these standards are so far limited to the DoubleTwist and LabBook viewers, Robinson expects to see a number of new programs appear soon, particularly toolkits that are able to import and export AGAVE and BSML submitted to the BioPerl and BioJava projects. “Having the XML tools packaged in these open source projects will put AGAVE and BSML capabilities on thousands of desktops as people update their libraries,” said Robinson.

Despite the fact that the XEMBL server has yet to be announced publicly, “people have already found it and have been quietly evaluating it,” Robinson said. He added that the EBI has also had discussions with groups who would like to use the XEMBL server in a production environment.

But while enthusiasm for the project is high, Robinson warned that XML and SOAP are still relatively new technologies and “may still prove to be ëthe Emperor’s new clothes’ to our community.” While leading IT companies such as IBM, Microsoft, and Sun are backing XML-based technologies and the Interoperable Informatics Infrastructure Consortium is seeking to base its future standards on XML, Robinson said he remains “cautiously optimistic” about the technology.

“Many people are pinning their hopes on XML and SOAP as a technology that will help the integration of bioinformatics applications Ö However, unless there’s agreement in our own community on standards for the XML format and SOAP protocol, we’ll still have anarchy. This is where the I3C and OMG may provide a vital role in coordinating efforts and reaching consensus.”

Edward Kiruluta, chief technology officer of DoubleTwist — a participating company in the I3C — said he was “delighted” to have the EBI’s support of the standard. “These partnerships help to make AGAVE an even more robust platform that can enhance collaboration throughout the life sciences community.”

Yet Robinson said he’s doubtful that a single over-riding technology will emerge to solve the industry’s data integration problems. “I’m agnostic to the technology used,” he said. “It’s the data that’s important, not the technology used to access it.”

— BT

Filed under

The Scan

Study Examines Insights Gained by Adjunct Trio RNA Sequencing in Complex Pediatric Disease Cases

Researchers in AJHG explore the diagnostic utility of adding parent-child RNA-seq to genome sequencing in dozens of families with complex, undiagnosed genetic disease.

Clinical Genomic Lab Survey Looks at Workforce Needs

Investigators use a survey approach in Genetics in Medicine Open to assess technologist applications, retention, and workforce gaps at molecular genetics and clinical cytogenetics labs in the US.

Study Considers Gene Regulatory Features Available by Sequence-Based Modeling

Investigators in Genome Biology set sequence-based models against observational and perturbation assay data, finding distal enhancer models lag behind promoter predictions.

Genetic Testing Approach Explores Origins of Blastocyst Aneuploidy

Investigators in AJHG distinguish between aneuploidy events related to meiotic missegregation in haploid cells and those involving post-zygotic mitotic errors and mosaicism.