First it was the number of genes. Now it’s the number of bases.
It turns out that the human genome may be as much as 10 percent smaller than originally thought. “The chromosomes are coming out smaller than what we expected,” says Robert Sutherland, who is working on finishing chromosome 16 at the Joint Genome Institute at Los Alamos National Laboratory.
Chromosome 16 was expected to contain 89 million bases. But now Sutherland and his colleagues say that it’s more like 80 million. “At first we were kind of concerned,” he says. Did they miss such a huge chunk? “But then we started hearing this from other groups as well,” he says.
If this pattern holds up, the genome may be as many as 300 million bases shorter than the oft-cited 3.1 billion.
The real answer won’t come until the gaps in the sequences are filled in. Chromosome 16, for example, still has 14 gaps of various sizes. “And we don’t know exactly how big those are,” says Sutherland. Most are likely less than one kilobase long, but some may be much larger.
Finishing it won’t be easy. “There are large repeats, large duplications, and very nasty areas that don’t to clone into BACs,” Sutherland says. There are some sequences where the repeats are identical except for one base in every 2,000. And none of the current assembly programs is capable of handling such large repeats, he says. Yet getting it all is important. “Even in these very difficult regions, or even in the repeat regions, there are active genes.”
So the finishers must scour the chromosome base by base and piece everything together manually. “You have to go in and pull those sequences out so that they’re not assembled, and then assemble them separately,” Sutherland says. So far JGI has gotten through 48 megabases of number 16 and expects to be finished some time in September. “It is a lot of work,” says Sutherland. But if chromosome estimates continue to shrink, they may be further along than they realize.
— Aaron J. Sender