Skip to main content
Premium Trial:

Request an Annual Quote

New Study from Broad Institute Lowers Human Gene Count to 20,500

NEW YORK (GenomeWeb News) – A study published online this week in the Proceedings of the National Academy of Sciences indicates that the number of protein-coding genes in the human genome may be much lower than the current estimate of around 24,500 genes. 
According to the study, published by Michele Clamp and colleagues at the Broad Institute, human gene catalogs such as Ensembl, RefSeq, and Vega include many open reading frames that are actually “random occurrences” rather than protein-coding regions — a finding that cuts the number of protein-coding genes in the genome to around 20,500. 
The Broad team analyzed ORFs for which there is no evidence of evolutionary conservation with mouse or dog. According to the researchers, it has been “broadly suspected” that many of these ORFs are “functionally meaningless,” but there has been no scientific evidence to prove they are not valid genes.
“As a result,” they note in the PNAS paper, “the human gene catalog has remained in considerable doubt.”
Clamp and colleagues developed a method to characterize the properties of putative genes that lack cross-species counterparts. By analyzing these nonconserved ORFs alongside the genomes of two primates, the researchers found that they are neither the result of gene innovation in the primate lineage nor the result of gene loss in mouse or dog.
This offers “strong evidence” that these nonconserved ORFs are indeed “spurious,” and should be removed from the gene catalogs, according to the paper.
The Broad team did acknowledge that the study has “certain limitations” that could impact the final gene count. For example, they note, they did not consider 197 putative genes that lie in regions that were omitted from the finished assembly of the human genome.
In addition, the authors explain in the paper, the nonconserved ORFs that they studied were included in current gene catalogs “because they have the potential to encode at least 100 amino acids.” Therefore, they note, “we thus do not know whether our conclusions would apply to much shorter ORFs.”
They also concede that it’s likely there are additional protein-coding genes yet to be found, but note that “the final total is likely to remain under 21,000.”

The Scan

Sick Newborns Selected for WGS With Automated Pipeline

Researchers successfully prioritized infants with potential Mendelian conditions for whole-genome sequencing or rapid whole-genome sequencing, as they report in Genome Medicine.

Acne-Linked Loci Found Through GWAS Meta-Analysis

Researchers in the European Journal of Human Genetics find new and known acne vulgaris risk loci with a genome-wide association study and meta-analysis, highlighting hair follicle- and metabolic disease-related genes.

Retina Cell Loss Reversed by Prime Editing in Mouse Model of Retinitis Pigmentosa

A team from China turns to prime editing to correct a retinitis pigmentosa-causing mutation in the PDE6b gene in a mouse model of the progressive photoreceptor loss condition in the Journal of Experimental Medicine.

CRISPR Screens Reveal Heart Attack-Linked Gene

Researchers in PLOS Genetics have used CRISPR screens to home in on variants associated with coronary artery disease that affect vascular endothelial function.