WASHINGTON, April 18 - Craig Venter chastised an audience of computational biologists at a meeting here today for failing to respond to the "shady statistical arguments" used in a recent attack on Celera's whole-genome shotgun technique.
"It's an embarrassment that there has not been more of a response from all of you," Venter told attendees of the sixth annual International Conference on Computational Molecular Biology, which kicked off today. He was referring to a paper published last month in the Proceedings of the National Academy of Science in which scientists from the public human genome-sequencing consortium criticized Celera's use of public data to prove the success of its own approach.
"[You] should be offended by the [paper's] use of mathematics to fool people," Venter fumed at the audience of mathematicians, statisticians, and algorithmists.
Later, in an interview with GenomeWeb, Venter said the PNAS paper is an example of scientists "using statistics to lie" and causing the general public to be "fooled by people with agendas" in matters that may not be so academic. The paper was co-authored by Washington University's Robert Waterston, Whitehead Institute's Eric Lander, and the Sanger Institute's John Sulston.
"All of us have the responsibility to educate the broader public in these issues," Venter said in the interview.
In his talk, Venter stressed the vital role mathematical and computational approaches will continue to play as genomics evolves. While improved assembly and gene-prediction algorithms were the key to Celera's success in sequencing the Drosophila, human, and mouse genomes, Venter said future computational challenges in biology wouldn't be so easy to identify.
With proteomic data estimated to reach the petabyte scale and individual genetic variation data predicted to rise into the exabytes, Venter offered that "We don't have the computers, the database systems, or the computational tools to deal with this information now, let alone for what's coming."
But he said a more important issue than computational power or algorithm development will be ensuring that future genomic research is based on a strong statistical foundation. Noting that researchers face the risk of "dangerous overinterpretation of genetic differences," Venter warned audience members that they bear the responsibility for keeping scientists honest.
Noting that the risk of developing certain diseases is based not only on genetic factors but on environmental factors as well, he said that computational biologists "will have to learn to deal with statistical probability and interpret that for the general public" in order to prevent the perception of genetic determinism.