OAKLAND, Calif.--Bioinformatics industry insiders expressed skepticism this month after DoubleTwist announced that it had used a proprietary computational method to identify "with high confidence" 65,000 genes, as well as another possible 40,000 genes, in the Human Genome Project's draft sequence data. The company, which has positioned itself as an internet-based research portal, said it intends to package and sell its results as the DoubleTwist Human Genome Database.
Widespread news media reports that heralded DoubleTwist as a new leader in the human genome race invoked ridicule from some in the industry who accused the company, which is rumored to be planning an initial public offering, of putting a spin on the news to generate investor interest. Publications that made note of DoubleTwist's accomplishment included BBC News, Business Week, Forbes, Time Magazine, and the New York Times.
"They have done a masterful job of blurring the distinction between what they have and what the draft genome sequence is," contended Cyrus Harmon, CEO of Neomorphic.
To be sure, DoubleTwist officials posted an explanation of their press release on their website, noting the need to distinguish their own "important achievement from other ongoing genomic projects," and warned BioInform not to trust any media reports.
But many bioinformatics insiders, perhaps not as sales-savvy as DoubleTwist's CEO John Couch, a longtime computer-industry exec, consider the company's marketeering an affront to traditional scientific protocol. Harmon and others accused DoubleTwist of doing "science by press release."
Todd Smith, president of software company Geospiza and former employee of Pangea Systems, asked, "Where's the peer reviewed work that supports their claim?" He added, "There's supposed to be something pure and good about science. There are sacred cows in academia." Making claims to the press without publishing your methods, he suggested, is not a way to win friends in science.
Winning customers, confusing competitors
Ultimately, however, it is customers, not competitors, that DoubleTwist needs to win over. And whether pharmaceutical and biotechnology companies will decide it is worth $10,000 per-seat or $650,000 annually for a server-installed database to access DoubleTwist's data remains to be seen.
While acknowledging that its per-user subscription fee is reasonable, some in the industry who spoke to BioInform wondered how DoubleTwist's annotation data compares to what is already available for free from the European Bioinformatics Institute's Ensembl database or the Oak Ridge National Laboratory's Genome Channel and Genome Catalog repositories.
Nick Tsinoremas, DoubleTwist's director of research, said the difference is in how annotation is conducted. "We think Ensembl is doing a good job, but we're doing several orders of magnitude more thorough [analysis]. Ensembl is relying on one gene prediction algorithm, and we have a robust combination to find genes," he said.
Tsinoremas told BioInform that DoubleTwist took a "three bucket approach" to annotate human genes. After refining and cleaning raw data from the US National Institutes of Health's UniGene set of gene-oriented clustered data, DoubleTwist ran algorithms to predict exons, introns, and coding regions.
He said it applied a "super algorithm" that combined the company's proprietary programs with several publicly available gene prediction algorithms. Stanford University's GeneScan tool was one of those, but Tsinoremas declined to name others.
In another "bucket," DoubleTwist identified genes by DNA homology using clustering and alignment tools developed by Pangea. DoubleTwist also searched protein databases to assign putative function and give more information about gene structure before putting the fragments in order.
Tsinoremas said that results of an initial analysis of chromosome 22 are available at DoubleTwist's website and that the company is preparing a paper summarizing the statistical results of its work for peer review. But Rob Williamson, chief operating officer, said the company is "circumspect" about revealing "some of the methodology."
The secrecy only piques competitors' suspicion. A bioinformaticist at a competing company said he was "dubious" about DoubleTwist's announcement. "How rigorous are their methods? Exactly how have they done this analysis?" he asked.
Ed Uberbacher, who led the public Genome Annotation Consortium at Oak Ridge National Laboratory and now heads Genomix, a company that recently licensed genome annotation technology to Celera (see story p. 2), warned, "Analysis of draft data is a problem because the gene count will differ. It's very approximate and is likely to be overestimated--you'll have pieces of the same gene in different fragments." Because the public project has generated only 3X coverage of the genome thus far, Uberbacher said it is "very rough."
"What they're doing is important, but the question is, What are the methods they used and how high quality are the results?" observed Harmon, whose company is busy applying its own software, used to annotate the Drosophila genome, to human genome data.
But Terry Gaasterland, a computational genomics assistant professor at Rockefeller University and a scientific advisor to DoubleTwist, said the company deserved credit for creating a useful library. "It's a lot of sweat work to execute all the different sequence analysis tools for assembled sequence data," she said. "The major genome centers have put together pipelines for running programs, but what DoubleTwist has done that is a service is taken all the available data, put it in one place, and run their own very good, competitive clustering and alignment tools on it."
And, Gaasterland noted, by carrying out the grunt work of assembling data, running tools to predict genes, and running sequence analysis tools to gather evidence, DoubleTwist has laid the groundwork for eventually annotating the finished human genome sequence.