NEW YORK, Aug. 9 - The human genome is littered with spare parts, according to a new analysis by geneticists from Case Western Reserve University and Celera Genomics.
Genetic scientists have long known about gene-segment duplication, the process by which copied shards of DNA break off and wander into a new position in the genome. Sometimes, disease is the result. More often, the replica genes cause no immediate harm, and eventually wind up modified by evolution or "silenced" altogether.
This biological phenomenon has been a major frustration for gene sequencers, since the assembly programs designed to match overlapping fragments of DNA are easily confused by the thousands of duplicate or near-duplicate segments that lurk in the human genome.
To solve this problem, the Case Western team, led by Jeff Bailey and Evan Eichler, devised an algorithm to pinpoint these copies in Celera's whole genome-shotgun data.
They identified nearly 9,000 segments of duplicated DNA, ranging from tens to hundreds of kilobases. By this calculation, copies account for at least 5 percent of the human genome.
The group also identified 169 regions of "genomic instability," areas where duplication-based rearrangement seems to happen more readily and may be prime locations for genetic disease.
Another surprising discovery: Because duplicate sequences are so common, the team estimates that as many as 100,000 of the SNPs in the public database are probably misidentified, the result of variant sequences rather than true polymorphisms.
Evolutionary biologists will undoubtedly be interested in these findings, since against all expectations, many of the duplicated DNA segments in the human genome seem to be particularly dense with genes. One theory holds that this process is one of the major engines of evolution--these "extra" gene segments, biologically superfluous, can be modified or specialized through mutation with less likelihood of causing harm.
The research appears in the Aug. 9 Science.