ROCKVILLE, Md.--The Institute for Genomic Research and Neomorphic of Berkeley, Calif., have teamed up to develop new annotation software with the immediate goal of assisting in TIGR's Arabidopsis sequencing project. In the future however, both organizations plan to apply it to other genomes, such as the rice and human genomes, David Kulp, Neomorphic's vice-president of research, told BioInform. "TIGR needed a mechanism to allow its biologists to sit down in front of the data, look at all of the available evidence, and make curated annotations," observed Kulp.
Neomorphic has developed and shared a working version of the Java-based application with TIGR, which is scheduled to receive a full-featured application by year-end. After that, Neomorphic will sell the software as part of its product line.
As part of its Arabidopsis initiative, TIGR lumped sequence information into a database, combining data from various academia-led genome projects and its own effort to sequence chromosome 2 of the plant. But because all of the research was done using different systems for sequencing, annotating, and storing data, it soon found its in-house annotation software was not the optimal solution.
Deciding that outside help was needed, TIGR partnered with Neomorphic to develop what is known as the Annotation Station, which is expected to replace TIGR's existing annotation system--"probably one of the better systems" in use among public groups doing annotation work, claimed Kulp. That said, the older tool is limited by its design and lack of flexibility and would require a big effort to modify it, he added.
Xiaoying Lin, assistant investigator and head of the Arabidopsis annotation operation at TIGR, said the new software is working "nicely" and confirmed that plans call for Annotation Station to take over from TIGR's other annotation package. Lin explained that Neomorphic's software communicates with a database via middleware being developed by TIGR that puts the data in a format the Annotation Station can handle. The collaboration, he said, combines TIGR's annotation database experience with Neomorphic's knowledge of biological data display.
The software aims to provide a biologist with the ability to monitor the sequencing process as bacterial artificial chromosomes are assembled into contigs and beyond as the process continues. Another objective is to allow scientists to annotate directly at the base level as well as to start from the chromosome and drill down to the base, added Kulp.
The application allows a user to inspect individual chromosomes, contigs, and BACs; pull out maps in the individual BACs; and then edit the annotation on that sequence. The editing is "fairly sophisticated" in that the user can edit more than just the genome's coding regions. "They're talking about adding curations for the full transcript as well as potentially any regulatory regions or other important details on the genomic sequence," Kulp remarked.
A guiding principle for application development was to tie together evidence with "biological expertise, or the knowledge that a biologist brings to the problem," said Kulp. The software lets the user see all of the automated analyses that have been performed on the genomic sequence in a unified way. "Unlike an application that you might have for just looking at your data, this allows you to assess evidence and then make a curated annotation on the data, which is significantly different from only being able to view individual results, for example, of analysis," said Kulp.
Another important design principle was to keep underlying data representations broad enough to cover a number of genomes. Of course, with higher organisms there will be much more data, but the system has been developed to manage that, said Kulp.
Cyrus Harmon, Neomorphic's president and CEO, agreed that some choices had to be made to suit general needs versus adding individual functions for a particular organism. "We've tried to stay fairly general and not add too many Arabidopsis-specific features so that we've designed it with an eye towards applicability to other genomes. Of course there comes a time when you probably do want to add in some organism-specific features," he stated.