BALTIMORE--The Third Annual Conference on Computational Genomics will take a broad look at the bioinformatics issues central to comparative genomics, reflecting the advances made in a number of genome projects over the last year, said conference cochair Steven Salzberg, director of bioinformatics at the Institute for Genomic Research, which sponsors the event. Approximately 300 attendees and 8 exhibitors are expected at the November 18-21 conference, being held here for the first time.
Twenty-two speakers from academia, government, and industry have been scheduled to give presentations covering comparative genomics, datamining and visualization, expression analysis, functional analysis and annotation, gene discovery, map and sequence integration, pathways and networks, protein structure prediction, proteomics genomic databases, and statistical sequence analysis. In addition, approximately 49 posters and 13 electronic posters will be presented.
Salzberg, who will cochair the conference with Anthony Kerlavage of Celera Genomics and David Searls of SmithKline Beecham Pharmaceuticals, remarked that the meeting will reflect the shift in the research community towards emphasizing "larger scale" comparisons of genomes. "Now that we're up to over 20 genomes, people are starting to develop computational methods for comparing one genome to another, for making evolutionary conclusions based on that, for quickly computing polymorphisms between organisms, and for looking at things like synteny between organisms and among multiple genomes," he explained. With the data available today, researchers can study synteny between human and mouse genomes or in closely related organisms like bacteria, where single base changes that occur at the level of whole genomes can be compared.
Larger scale comparisons
Several talks will discuss how to compare one whole genome to another, a question that wasn't "even meaningful a few years ago," Salzberg noted. "In the last year or two, it's become very meaningful because we have a lot of genomes to look at," he added.
Webb Miller of Pennsylvania State University will discuss comparing human and mouse sequences, and Bob Mau of the E. coli Genome Center at the University of Wisconsin will talk about the use of suffix arrays in whole genome comparisons. Of the four tutorials on the agenda, David Sankoff of the University of Montreal will lead one on genome rearrangements that will address some of the computational methods needed for comparing genomes.
Opening the conference will be two other tutorials during which Simon Kasif of Compaq and the University of Illinois will speak on "Datamining in Bioinformatics" and Gary Bader of University of Toronto and Christopher Hogue of the Samuel Lunenfeld Research Institute will present "BIND, the Biomolecular Interaction Network Database."
A Friday morning session will include comparative genomics talks by David States of Washington University in St. Louis; Penn State's Miller; University of Wisconsin's Mau; Janan Eppig of the Jackson Laboratory on "The Mouse Genome Database: Integrating Genomic and Biological Knowledge"; and Owen White of TIGR on "Omniome: Querying and Visualizing Fully Sequenced Microbial Genomes."
Friday afternoon sessions include: Shankar Subramanian of the University of Illinois presenting on "Sequence-Structure Mapping: Lessons from Bioinformatics"; Thomas Lengauer of the German National Research Center for Information Technology on "Structure-based Methods for Finding Target Proteins and Lead Structures for Drug Design"; Ying Xu and Dong Xu of the Oak Ridge National Laboratory on "Protein Structure Determination Using Limited NMR Data and Protein Thread"; Henrik Nielsen of the Center for Biological Sequence Analysis at the Technical University of Denmark on "Prediction of Signal Peptides and Signal Anchors by Neural Networks and Hidden Markov Models: the New SignalP"; and James Garrels of Proteome on "BioKnowledge Library: Annotated Model Organism Databases for Comparative Genomics." The day's agenda will close with Sankoff's tutorial.
Plenary session three on Saturday will feature Peer Bork of the Biocomputing Molecular Biology Lab on "Comparative Sequence Analysis: From Polymorphism to Protein Domain Annotation"; Kimmen Sjolander of Molecular Applications Group on "Issues in Target Identification and Prioritization"; Andrea Califano of IBM's TJ Watson Research Center on "G-protein Coupled Receptor"; Pankaj Agarwal of SmithKline Beecham on "Gene Prediction Accuracy in Large Genomic Sequences"; and Masaru Tomita of the Laboratory for Bioinformatics at Keio University, Japan, on "Computational Genomics of Non-Coding Sequences."
Later that day, Temple Smith of Boston University will lecture on "Protein Domain Dissection and Functional Identification," and David Eisenberg of University of California, Los Angeles, will talk about "Protein Functions in Yeast and Mycobacterium Tuberculosis."
The conference winds up on Sunday with talks from John Moult of the University of Maryland; Bill Grundy of Columbia University on "Analysis of Microarray Gene Expression Data Using Support Vector Machines"; Fredj Tekaia of the Pasteur Institute on "The Genomic Tree"; and Martin Huynen of the European Molecular Biology Laboratory on "Comparative Genomic Analysis."
Testing tool precision
Nomi Harris of the Berkeley Drosophila Genome Project will give the final tutorial on "The Genome Annotation Assessment Project," which, according to Salzberg, set out to determine how accurate the research community's tools are. For some time, Salzberg noted, computational genomics researchers have questioned the precision of many of the existing annotation tools for discovering the location of genes on a genome and assigning function to those genes.
"With the amount of data now in the public domain, it's becoming more and more urgent that we improve those tools because biologists rely on the results of these computational algorithms. And the biologist doesn't really have the background to understand where the computational method might have gone wrong," Salzberg observed. "So they just have to trust it, or else they can use nothing."
Using GenBank as an example, he said a researcher has no easy way of checking the accuracy of results received from a search. GenBank doesn't show where the functional assignment originated, he noted. Usually it comes from someone else's database match, and this would be acceptable if everything in GenBank was correct. But that's not the case, said Salzberg, adding that the problem could get worse. "The more we put inaccurate data in there, the more difficult it's going to be in the future to unravel all that and fix it. A lot of the work in computational genomics is aimed at coming up with better computational methods that will allow you to get more accurate assignments of function for proteins," he said.
As the bioinformatics area grows and new university programs bring more people into the field, Salzberg expects conference attendance to expand. When asked if this meeting could become as large as TIGR's Genome Sequencing and Analysis Conference, which attracted about 2,100 people to its annual event in September, he hesitated to make any bold statements, but noted that after three years GSAC had only about 100-150 attendees--a mark that this meeting has already surpassed. "It depends on too many factors for me to predict but I think it will grow quite a bit," he said.