Skip to main content
Premium Trial:

Request an Annual Quote

As Servers Multiply, Stein Sees Expectations for Distributed Annotation System Grow

Premium

Since its first demo at BOSC 2000 a year ago, Lincoln Stein’s distributed sequence annotation system (DAS) has rapidly gained support throughout the bioinformatics community as an effective way to allow third-party genomic annotators to integrate and view their annotations along with those of other researchers.

The client-server system enables a single machine to gather annotation information from multiple web sites, collate the information, and display it in a single view. Servers are currently running version 1 of DAS — written by Stein, Robin Dowell, Sean Eddy, and Rodney Jokerst — at WormBase, FlyBase, Ensembl, the Institute for Genomic Research, and the University of California, Santa Cruz.

But popularity has its downside. “People expect DAS to do more than it was really designed to do,” Stein told BioInform. It seems that as researchers continue to grapple with the challenges of data integration and interoperability, many are turning to DAS for lack of a better alternative.

At almost every bioinformatics meeting or conference, DAS pops up as an example of a successful approach to data integration — but only when Stein isn’t present to keep proponents of the system grounded. At the recent Model Organism Bring Your-own Database Interface Conference, for example, Stein found that attendees were looking to DAS as a one-stop solution to the problem of database interoperability.

“I had to let them down easily,” he said.

“[DAS] wasn’t designed to detect all the contradictions between the Celera and the Human Genome Project assembly. It wasn’t designed to resolve contradictions between people’s protein predictions and tell you which is the right one. It isn’t designed for genetic maps or protein domains. It really was designed for sequence browsing,” Stein said.

While DAS uses “a number of tricks” to resolve the problems that occur when one person annotates on one version of a map and another person annotates on another, Stein said broader integration problems are outside the realm of the system’s capabilities.

“I think it has simplified the life of the big annotation display engines because they don’t have to devote people to the task of importing data,” Stein said. “We have a full-time person here on WormBase and all he does is take data files from other groups and reformat them so we can display them. If we could convince everybody to use DAS then we wouldn’t have to do that at all.”

But Jim Kent, who heads up the UCSC genome browser, said that running the DAS server hasn’t made his life much easier. “It’s yet another format and it’s huge,” Kent said. Due to the XML format and redundancies in the system, Kent said that representing the UCSC database in DAS is about 15 times as large as the normal representation. He noted, however, that it does compress well.

The size wouldn’t be a problem, Kent noted, if DAS were being used as intended. “It was originally designed to look at a little bit at a time,” he said. “But since it’s already there people are starting to use it as a standardized way to grab everything.”

Kent said he put the server out in anticipation of the problems that would occur as the system scaled from WormBase to the human genome. “The human gene annotations are 100 times as large as the worm gene annotations. I felt that it would get the people writing the various DAS servers working on the problem of being able to cope with that early.”

Both Kent and Stein agree that many of the kinks have been worked out, but there’s plenty of room for improvement. Stein and the other DAS developers issued a request for comments at biodas.org to gather a wish list of features for version 2. Stein said he’s reading through the RFCs now and hopes to have an idea of what features will be included by early January. Following another round of RFCs, the specification should begin to take shape by the end of February.

Features most likely for DAS 2 include an improved installation toolkit, an annotation ontology, an expanded array of server types, and an extended coordinate system.

Those interested in getting involved with DAS should be warned that it changes often — version 1.0 was barely up on September 30 before Stein’s last-minute tinkering led to the release of 1.01 on October 1. This methodology is not to everybody’s taste. “I wish they had thought about it, made sure that it scaled well and then released a stable version,” said Kent. “Instead it was released at 0.94. I had to change the code to 0.95. Now it’s up to 1.01.” While admitting that the system is stabilizing, Kent suggested jokingly that “programmers should not be allowed to have fractional releases.”

However, Stein is following a successful precedent. “That’s how TCP/IP was designed,” he said. “In little bits and pieces.”

— BT

Filed under

The Scan

Not Yet a Permanent One

NPR says the lack of a permanent Food and Drug Administration commissioner has "flummoxed" public health officials.

Unfair Targeting

Technology Review writes that a new report says the US has been unfairly targeting Chinese and Chinese-American individuals in economic espionage cases.

Limited Rapid Testing

The New York Times wonders why rapid tests for COVID-19 are not widely available in the US.

Genome Research Papers on IPAFinder, Structural Variant Expression Effects, Single-Cell RNA-Seq Markers

In Genome Research this week: IPAFinder method to detect intronic polyadenylation, influence of structural variants on gene expression, and more.