HINXTON, UK--The implications of the speedup of various genome projects was well reflected in this year’s Genome-Based Gene Structure Determination symposium, held June 1-2 on the Wellcome Trust Genome Campus here. The event, organized by Alphonse Thanaraj of the European Bioinformatics Institute, evolved from earlier workshops the EBI held for its industry associates. Within one year, gene prediction and annotation has shifted from primarily DNA-based to genome-based, with refined methodologies and genome comparisons now part of semi-automatic genome annotation pipelines.
The symposium, which attracted close to 200 participants from both academic and commercial bioinformatics backgrounds, focused on state-of-the-art approaches to automatically annotating large genomes. The new journal Briefings in Bioinformatics will dedicate a whole issue at the end of this year to the advances in gene prediction that were revealed during this symposium.
The meeting featured 13 invited speakers and 24 specialist poster presentations.
Gene prediction methods
The first day was dedicated to methodologies in gene prediction. Speakers included: Michael Zhang of the Cold Spring Harbor Laboratory in New York who described the application of discriminant analysis in DNA sequence motif recognition, especially with regard to splice sites and exons in the human genome; and Thomas Werner of GSF in Germany on the difficult task of identifying DNA sequence elements of transcription, accomplished by modeling the organization of promoters and using these versatile models to predict promoters in genomic sequence.
Anders Krogh of Denmark’s Center for Biological Sequence Analysis showed how combining ab-initio gene finding with database matches and other external information clearly improves performance, and even allows for errors and uncertainties.
“Large-scale genome annotation projects” was the theme on day-two of the meeting. Suzanna Lewis of the Berkeley Drosophila Genome Project spoke on the topic of managing those projects, exemplified by the collaboration between Celera Genomics, the Berkeley project, and a team of experts that helped transform the process of annotation into a high-throughput operation.
The Genome Channel and Genome Catalog, developed and maintained by the Genome Annotation Consortium, were presented by Edward Uberbacher of the Oak Ridge National Laboratory in Tennessee. Besides human sequence, these tools also represent all completed microbial genomes in a rich annotated view.
Richard Durbin of the Sanger Centre presented gene prediction in the C. elegans genome. Initially this was done via Genefinder, supplemented with other tools, but now a new system called GAZE is in development that uses flexible state specification for the definition of a gene and also takes prediction information from other sources into account.
The representation of gene function and process in genome databases was outlined by Michael Ashburner, also of EBI. A Gene Ontology has been developed that will allow researchers to consistently annotate and query for these functions, processes, and cellular components in genomic databases.
Annotation and chromosome-wide analysis of human chromosome 22 was presented by Tim Hubbard of the Sanger Centre. Lessons learned from chromosome 22 work indicate that automatic gene prediction is not reliable enough to predict genes, but a very valuable tool when combined with other information. At the Sanger Centre ab-initio gene finding is closely integrated with experimental work to annotate chromosome 22.
A larger conference is planned on this topic next year, again on the Wellcome Trust Genome Campus, June 12-15, 2001.
--Jean-Jack M. Riethoven