NEW YORK, Feb 11 - While Celera's scientists have refocused their efforts from human genome sequencing to the challenges of functional genomics, applied genomics, and proteomics, the Human Genome Project's army of scientists plans to continue sequencing the genome, a task that could take another two years.
Celera's scientists write in their paper on the sequence of the genome, which is to be published in Science February 16, that the clear next step is to "define the complexity that ensues when this relatively modest set of 30,000 genes is expressed." The human genome project will publish its papaers in Nature on February 16, while the public and private efforts will hold a joint press conference to discuss their findings in Washington on Monday.
Recently, Celera undertook several collaborations aimed at attacking this broader problem of complexity, including a collaboration with Compaq and Sandia Laboratories to develop a life sciences computer and a collaboration with the Institut de Recherche Pierre Fabre in France to analyze how particular genetic variations impact breast cancer treatments. And Celera also launched a massive effort aimed at sequencing the human proteome.
The Genome Project researchers, however, believe that their clear next step is to finish the work they started, and complete a finished copy of the Human Genome by 2003.
"We got the low hanging fruit of the genome," said Robert Waterston, director of the Washington University Genome Sequencing Center and one of the Genome Project's key authors on the paper. "Ninety percent is plenty good enough to give us good insight into the overall composition of the human genome. But we didn't start out to have our project end in a draft sequence."
Currently, 95 percent of the genome has been sequenced in some form, Waterston said. But there are still gaps.
"The last four percent is the hardest to estimate, because we don't really have it in hand," he said.
Some of this DNA has been so far impossible to clone in BACs. This problem is nothing new, as scientists sequencing the C. Elegans worm found that up to 20 percent of its DNA resisted cloning in BACs, Waterston said. The researchers tried yeast clones for these regions with success, but this method may not work for the human genome.
Aside from this cloning problem, the most difficult regions remaining to be sequenced and mapped include long "repeats," or sections of the genome that involve either repeated nucleotide sequences such as CACACA..., repeated sequences interspersed within other sections of code, blocks of 10 to 300 kilobases that have been copied from one region of the genome to another, and tandem repeat sequences at the telomerase, centromeres and extrachromosomal DNA.
The Washington University Sequencing Center just finished the Y chromosome, and in doing so, "we learned a lot of things about how to finish nasty repeats," said Rick Wilson, co-director of the center. The two upper branches of the chromosome contain long inverted repeat sequences of up to a half megabase, or four BACs long, that seem to mirror one another, Wilson said. In order to determine where each long repeat went, the sequencers at Washington University had to painstakingly sequence entire clones, to find the one in 50,000 bases in which these repeats differed.
This fine-tooth comb sequencing will continue, with a mix of bench work and computer work, Wilson said. Computer tools include a program called Prefinish that sorts out unfinished clones as to degree of difficulty, then assigns them to a person with appropriate skill level. The sequencing centers will also use Phrap and other finishing tools they develop.
Eventually, some researchers may want to finish the genome to a greater extent than others, Waterston predicted. "There may be a separation [between researchers] depending on how anal-compulsive people are," he said. "We had the same thing in the worm, where it is increasingly exhausting to get the last little bits."