Since its first demo at BOSC 2000 a year ago, Lincoln Steins distributed sequence annotation system (DAS) has rapidly gained support throughout the bioinformatics community as an effective way to allow third-party genomic annotators to integrate and view their annotations along with those of other researchers.
The client-server system enables a single machine to gather annotation information from multiple web sites, collate the information, and display it in a single view. Servers are currently running version 1 of DAS written by Stein, Robin Dowell, Sean Eddy, and Rodney Jokerst at WormBase, FlyBase, Ensembl, the Institute for Genomic Research, and the University of California, Santa Cruz. But popularity has its downside. People expect DAS to do more than it was really designed to do, Stein told BioInform. It seems that as researchers continue to grapple with the challenges of data integration and interoperability, many are turning to DAS for lack of a better alternative. At almost every bioinformatics meeting or conference, DAS pops up as an example of a successful approach to data integration but only when Stein isnt present to keep proponents of the system grounded. At the recent Model Organism Bring Your-own Database Interface Conference, for example, Stein found that attendees were looking to DAS as a one-stop solution to the problem of database interoperability. I had to let them down easily, he said. [DAS] wasnt designed to detect all the contradictions between the Celera and the Human Genome Project assembly. It wasnt designed to resolve contradictions between peoples protein predictions and tell you which is the right one. It isnt designed for genetic maps or protein domains. It really was designed for sequence browsing, Stein said. While DAS uses a number of tricks to resolve the problems that occur when one person annotates on one version of a map and another person annotates on another, Stein said broader integration problems are outside the realm of the systems capabilities. I think it has simplified the life of the big annotation display engines because they dont have to devote people to the task of importing data, Stein said. We have a full-time person here on WormBase and all he does is take data files from other groups and reformat them so we can display them. If we could convince everybody to use DAS then we wouldnt have to do that at all. But Jim Kent, who heads up the UCSC genome browser, said that running the DAS server hasnt made his life much easier. Its yet another format and its huge, Kent said. Due to the XML format and redundancies in the system, Kent said that representing the UCSC database in DAS is about 15 times as large as the normal representation. He noted, however, that it does compress well. The size wouldnt be a problem, Kent noted, if DAS were being used as intended. It was originally designed to look at a little bit at a time, he said. But since its already there people are starting to use it as a standardized way to grab everything. Kent said he put the server out in anticipation of the problems that would occur as the system scaled from WormBase to the human genome. The human gene annotations are 100 times as large as the worm gene annotations. I felt that it would get the people writing the various DAS servers working on the problem of being able to cope with that early. Both Kent and Stein agree that many of the kinks have been worked out, but theres plenty of room for improvement. Stein and the other DAS developers issued a request for comments at biodas.org to gather a wish list of features for version 2. Stein said hes reading through the RFCs now and hopes to have an idea of what features will be included by early January. Following another round of RFCs, the specification should begin to take shape by the end of February. Features most likely for DAS 2 include an improved installation toolkit, an annotation ontology, an expanded array of server types, and an extended coordinate system. Those interested in getting involved with DAS should be warned that it changes often version 1.0 was barely up on September 30 before Steins last-minute tinkering led to the release of 1.01 on October 1. This methodology is not to everybodys taste. I wish they had thought about it, made sure that it scaled well and then released a stable version, said Kent. Instead it was released at 0.94. I had to change the code to 0.95. Now its up to 1.01. While admitting that the system is stabilizing, Kent suggested jokingly that programmers should not be allowed to have fractional releases. However, Stein is following a successful precedent. Thats how TCP/IP was designed, he said. In little bits and pieces. BT