Based on an analysis of seven proteomic studies, researchers from the Spanish National Cancer Research Center say in a Human Molecular Genetics study that the human genome contains fewer than 20,000 protein-coding genes.
"The coding part of the genome is constantly moving," Alfonso Valencia, the vice director of basic research at CNIO, says in a statement. "No one could have imagined a few years ago that such a small number of genes could make something so complex."
Valencia and his colleagues pulled together seven large mass spec-based analyses, identifying 255,188 peptides, which they then mapped to genes in the GENCODE 12 annotation of the human genome. From this, they confirmed that 11,840 protein-coding genes did indeed express proteins. Most of these confirmed protein-coding genes, they note, correspond to old and conserved ORFs.
Another 2,000 or so putative protein-coding genes, they add, did not appear to actually produce proteins, based on their analysis.
"The human genome is the best annotated, but we still believe that 1,700 genes may have to be re-annotated," Valencia adds. "Our work suggests that we will have to redo the calculations for all genomes, not only the human genome."