NEW YORK (GenomeWeb News) – A study published online this week in the Proceedings of the National Academy of Sciences indicates that the number of protein-coding genes in the human genome may be much lower than the current estimate of around 24,500 genes.
According to the study, published by Michele Clamp and colleagues at the Broad Institute, human gene catalogs such as Ensembl, RefSeq, and Vega include many open reading frames that are actually “random occurrences” rather than protein-coding regions — a finding that cuts the number of protein-coding genes in the genome to around 20,500.
The Broad team analyzed ORFs for which there is no evidence of evolutionary conservation with mouse or dog. According to the researchers, it has been “broadly suspected” that many of these ORFs are “functionally meaningless,” but there has been no scientific evidence to prove they are not valid genes.
“As a result,” they note in the PNAS paper, “the human gene catalog has remained in considerable doubt.”
Clamp and colleagues developed a method to characterize the properties of putative genes that lack cross-species counterparts. By analyzing these nonconserved ORFs alongside the genomes of two primates, the researchers found that they are neither the result of gene innovation in the primate lineage nor the result of gene loss in mouse or dog.
This offers “strong evidence” that these nonconserved ORFs are indeed “spurious,” and should be removed from the gene catalogs, according to the paper.
The Broad team did acknowledge that the study has “certain limitations” that could impact the final gene count. For example, they note, they did not consider 197 putative genes that lie in regions that were omitted from the finished assembly of the human genome.
In addition, the authors explain in the paper, the nonconserved ORFs that they studied were included in current gene catalogs “because they have the potential to encode at least 100 amino acids.” Therefore, they note, “we thus do not know whether our conclusions would apply to much shorter ORFs.”
They also concede that it’s likely there are additional protein-coding genes yet to be found, but note that “the final total is likely to remain under 21,000.”