DENVER--Concerns about how to handle the rush of new human genome sequence data dominated bioinformatics talk at the annual meeting of the American Society of Human Genetics, held here October 27-31. Speakers contended that the influx of new DNA information is outpacing scientists' ability to use it, and that bioinformatics is a weak link in the genomic sequencing community.
With 7 percent of the Human Genome Project's sequence finished, output is now approaching a pace of 500 megabases a year. "It's going to be coming through the floodgates," noted Andy Baxevanis of the National Human Genome Research Institute (NHGRI).
Institute Director Frances Collins observed that "all this sequence pouring in is really going to stress the system."
"We need better tools that are more user-friendly, so that the average scientist can make the most of this incredible database of information and not be stymied by the lack of good algorithms," he continued. "We don't have a hardware problem. We have a software problem."
Software to pick out exons, the key protein regions of genes, isn't bad, Collins said, but other areas need intensive work. "We've got very poor algorithms right now to predict where the regulatory sequences are," he claimed, "and we don't have nearly enough in the way of user-friendly tools to allow people who are sifting through 10 million base pairs trying to find a diabetes gene to figure out how to use this information in the best way."
Baxevanis, author of the new book Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, reviewed software including GRAIL, GenScan, and MZEF, which help gene sleuths pick out exons amid the vast stretches of noncoding DNA. He also reviewed BLAST, the popular tool for searching databases for homologous sequences and proteins, and Genotator, which runs all these programs together and helps select the best results. Each tool has pitfalls, stressed Baxevanis. "The take-home message is, you have to understand what the methods are doing," he said. "Depending on the data you're putting in, you're going to get different results."
While Celera Genomics did not make any major presentations at the meeting, the Craig Venter/Perkin-Elmer venture, which aims to sequence the human genome within three years,was a strong silent presence. Several NHGRI staffers questioned Celera's whole-genome shotgun sequencing approach. "While that's a very creative idea and has clearly worked well for bacteria--Dr. Venter pioneered it for that--whether it will work for a repeat-laden, large genome like the human is subject of much debate," Collins observed. He asserted that the majority of people who have modeled the genome have concerns about whether it's going to assemble very well.
Collins predicted that final assembly of the overlapping DNA fragments will be Celera's main stumbling block. "When you're doing this kind of assembly, it's really an 'n2' problem," he said. "You have to compare every piece to every other piece. So that means if it is a thousand times bigger [than bacteria], it's a million times harder to make the assembly. And that ignores the repeat problem, which is virtually not a bother in the bacterial genome. It's a huge bother in the human."
But Collins conceded that Celera has forced the Human Genome Project to accelerate its agenda. "Our five-year plan is probably a little more aggressive than if last May's announcement had not come along," he admitted."If that gets peoples' juices flowing and makes people want to go faster, so much the better."
The meeting also reflected growing competition in the field of gene arrays, which continue to tantalize geneticists with visions of testing against thousands of genes at once. Vysis grabbed the most attention by unveiling the first commercial genomic array for detecting abnormal gene-copy numbers; unlike some better-known gene expression arrays, the Vysis system measures amplification of the genes themselves, not their RNA products. The system, called GenoSensor, got a powerful endorsement from NHGRI's Alli Kallioniemi, who said CGH arrays such as Vysis's could be used in tandem with expression arrays to discover more quickly the most important genes, such as erbB2, which proliferates in cancers.
"Build-your-own" chips emerged as another conference theme. Donna Albertson of the University of California, San Francisco, reported making and testing a CGH array system for chromosome 20 together with the Lawrence Berkeley National Laboratory. "We're interested in developing an array of clones that would actually allow us to scan the whole genome" for copy number changes occurring in cancers and other diseases, she explained.
Stanford's Barbara Dunn built a full genome expression array system for yeast and has made all software and equipment available at http://cmgm.stanford.edu/pbrown/mguide. Tom Kornberg, a Drosophila geneticist from the University of California, San Francisco, praised Dunn's blueprint. "I personally built one of these arrayers with my own hands. It took two weeks once the parts arrived, and it was much cheaper," he reported.
Affymetrix, meanwhile, showed how it continues to refine its DNA expression chip technology. Company biologist Mamatha Mahadevappa described a new method for preparing samples that greatly reduces the amount of starting material for RNA extraction, making it much easier to get a homogeneous RNA sample for analysis. In addition, the company's Janet Warrington previewed the future of chips as miniaturization technology improves.
"In the near future we will be using probes that array tens of thousands of genes," she predicted, "and perhaps the entire human genome."