NEW YORK, Feb 11 - The human genome sequencing race may be over, but as Celera and the Human Genome Project prepare to unveil their official accounts of their sequencing efforts in competing journals this week, the battle to claim first prize continues to smolder.
In Celera's paper, "The Sequence of the Human Genome," which will appear in the February 16 issue of Science, the company's scientists argued that their success in sequencing over 90 percent of the genome vindicated their whole-genome shotgun method as well as the modified shotgun method that involved mapping shotgun random reads onto a scaffold of contiguous BACs obtained from the public project.
But Human Genome Project scientists disagreed, saying that Celera's paper actually vindicates their BAC clone-and map-based method of genome sequencing.
''The take-home message is that you need a map to assemble the genome," said Tim Hubbard, director of genomic analysis at the Sanger Center. "They took the amount of data we assembled, [and their own], and they got the same length. We would at least expect them to extend ours. They didn't. They ended up with more gaps than we did."
The Human Genome Project's main paper, "Initial Sequencing and Analysis of the Human Genome" will appear in the February 15th issue of Nature along with 23 ancillary papers and comments.
Scientists from the public and private effort are, however, scheduled to call at least a temporary truce on Monday for a joint press conference in which the researchers will discuss their findings.
The genome project's leaders decided to submit the papers to Nature in December, abandoning their initial plans to publish in Science after the journal decided to allow Celera to publish its sequence on its website, restricting access to those who sign an agreement not to redistribute the data or use it for commercial purposes. Previously, authors who submit gene-sequence have been required to submit it to GenBank.
Even though two months have passed since this controversy over Science’s decision to accept Celera's submission's terms, the furor this decision provoked among the public project researchers has not subsided.
"If this model is allowed to propogate, it's going to be a mess," said Robert Waterston, director of the Washington University Genome Sequencing Center and one of the Genome Project's key authors on the sequencing paper. "Imagine a world where there's a [different] database for every organism that's been sequenced. Even though it's accessible, it's inconvenient."
In an editorial that accompanies Celera's paper, Science editor Donald Kennedy, and Barbara Jasny, the supervisory senior editor in charge of genome-related papers, defended Science' s policy. "Had the Celera data been kept secret, it would have been a serious loss to the scientific community," Kennedy and Jasny wrote. "We hope that our adaptability in the face of change will enable other proprietary data to be published after peer review, in a way that satisfies our continuing commitment to full access.";
In its 48-page paper, authored by Craig Venter, Mark Adams, Eugene Myers, and host of other scientists, the company asserts that it completed in nine months what the Genome Project had set out to do in 15 years.
In discussing the origin of the whole genome shotgun method, the paper portrays the Celera scientists as visionaries who persisted with their unorthodox plan despite being opposed from the scientific community.
"In 1997, [James] Weber and Myers proposed whole-genome shotgun sequencing of the human genome," the paper's authors write. "Their proposal was not well received. However, by early 1998, as less than 5% of the genome had been sequenced, it was clear that the rate of progress in human genome sequencing worldwide was very slow, and the prospects for finishing the genome by the 2005 goal were uncertain."
Ultimately, Celera sequenced over 90 percent of the genome covered in scaffolds of over 100,000 base pairs, and 25 percent in scaffolds of 10 million base pairs or larger as of October 1, 2000. They found 26,588 genes "for which there was strong corroborating evidence" and 12,000 predicted genes. The paper's authors also said Celera has found 2.1 million SNPs, one percent of which resulted in protein variation.
Celera did, however, concede that the compartmentalized shotgun method it employed using BACs from the human genome project "was a few percentage points better in terms of coverage and slightly more consistent than" the whole genome assembly method. But Celera just as quickly dismissed the BAC-clone approach, saying that "the cost and overall efficiency of clone-by-clone approaches makes them difficult to justify as a stand-alone strategy."
The Human Genome Project's paper described and defended its combined shotgun-BAC cloning process, saying it would ultimately require less work to finish the sequence due to the fact that that repeated segments and misassembles could be more easily sorted out with BACs. They also said that this method would make it possible to avoid mixing up different DNA from a variety of people as well as make work easier to divide up among an international consortium.
The Human Genome Project reported in its 63-page paper that it had sequenced 90 percent of the genome to at least draft stage, and had finished a third, as of October 7, 2000. The authors estimated, based on combination of known genes and those it has predicted using Ensembl software, to have found 24,500 actual genes; but estimated the total number of human genes to be somewhere between 30,000 and 40,000. The Project also reported finding 1.4 million SNPs, substantially fewer than Celera's researchers.
For final assembly, Celera reported that they mapped a scaffold of BAC contiguous reads onto two maps, GeneMap99, which was developed by the publicly-funded International RH Mapping consortium, and another one developed by Washington University.
This final assembled sequence, the authors said, had fewer gaps than the Human Genome Project sequence, except that on chromosomes 21 and 22, the two most complete chromosomes in the Human Genome Project sequence, Celera said its sequence had more gaps.
James Kent, the University of California Santa Cruz developer of the Golden Path genome browser, said this small difference between Celera's results and the public project's data showed that the public project could easily catch up.
"The Celera [assembly] is about two to three percent better than the public one," he said. "And I don't know what their strategy is to improve it. Meanwhile, [the Genome Project's assembly] is a working draft, and we are committed to finishing it."
Commenting on upside of Celera's approach, the authors said: "If the raw sequence reads from the whole-genome shotgun component are made available, it may be possible to evaluate the extent to which the sequence of the human genome can be assembled without the need for clone-based information. [This would] help to refine sequencing strategies for other large genomes."
Nevertheless, in interviews with GenomeWeb several members of the public project insisted on getting the credit for contributing much more than Celera to the genome effort, not just in terms of mechanical sequencing, but also in analysis of the genome.
"The bulk of [Celera]'s sequencing paper describes how they did the whole genome shotgun and sequence assembly," said Bruce Roe, director of the Advanced Center for Genome Technology at the University of Oklahoma. "The question is what now comes out of the Science paper. The answer is very little."
The Genome Project's paper, on the other hand, included about 40 pages of scientific analysis of the genome, Roe noted.