Researchers on the project confirmed the existence of 19,599 protein-coding genes and another 2,188 DNA segments that are predicted to be protein-coding genes, according to the findings.
"The analysis found that some of the earlier gene models were erroneous due to defects in the unfinished, draft sequence of the human genome," said Jane Rogers, head of sequencing at the Wellcome Trust Sanger Institute in
The Nature paper also provides a peer-reviewed description of the finishing process and an assessment of the quality of the finished human genome sequence. According to that assessment, the finished sequence covers more than 99 percent of the euchromatic portion of the human genome and was sequenced to an accuracy of 99.999 percent.
Since the working draft was completed in 2002, the contiguity of the sequence has been improved. The average DNA letter now sits on a stretch of 38.5 million base pairs of uninterrupted sequence - about 475 times longer than the 81,500 base-pair stretch available before. The human genome sequence still contains 341 gaps, the consortium noted, compared to the 150,000 gaps in the sequence when the working draft was completed. It said closing the remaining gaps would require more research and new technologies.
The finished sequence provides a much clearer view of certain phenomena, according to the researchers, such as duplication of DNA segments and the birth and death of genes. For example, their analysis found that distribution of segmental duplications varies widely across human chromosomes. The Y chromosome is the most extreme case with segmental duplications occurring along more than 25 percent of its length.
In addition, researchers found that some segmental duplications tend to be clustered near the centromeres and telomeres of each chromosome. Researchers speculate that these segmental duplications may be used by the genome as an evolutionary lab for creating genes with new functions.
Authorship of the Nature paper is shared by more than 2,800 researchers who took part in the consortium, which includes scientists located at 20 institutions in
The finished sequence and its annotations can be accesssed through several public genome browsers listed here.