NEW YORK (GenomeWeb News) – At least 60 de novo genes have cropped up in the human genome since our lineage diverged from that leading to chimpanzees, according to a study appearing online last night in PLoS Genetics.
Researchers from China and Canada came up with a pipeline for comparing primate gene and protein sequences to one another in an effort to find non-coding ancestral sequences that have morphed into protein-coding genes in the human genome. The search led to 60 potential de novo genes that appear to be expressed at both the messenger RNAs and protein level, based on database searches.
Because these de novo protein-coding sequences appear to be most highly expressed in tissues in the testes and the cerebral cortex, the team speculated that the genes may have been involved in the acquisition of human-specific traits.
"The number of de novo genes that we found in the human genomes is much higher than that expected based on previous estimates of the rate of de novo origination," corresponding author Ya-Ping Zhang, a researcher affiliated with the Chinese Academy of Sciences' Kunming Institute of Zoology and Yunnan University, and co-authors wrote, "therefore, we suggest that a greater appreciation of de novo origination of genes is needed."
Zhang and colleagues did a series of sequence comparisons to try to track down de novo protein-coding sequences, including BLAST searches comparing human protein sequences with those in the chimpanzee, orangutan, rhesus macaque, and marmoset.
After tossing out human sequences that were missing start or stop codons, the researchers were left with more than 350 genes that they subsequently used to search against chimpanzee and orangutan genome sequences. They also inspected sequences by hand to get rid of human genes that did not seem to have arisen from authentic ancestral non-coding sequences.
"To be a candidate de novo originated gene," the study authors explained, "in addition to having a potentially translatable open reading frame in the human genome, the gene must have been present and disrupted (i.e., non-translatable), in both the chimpanzee and orangutan genomes."
The search led to 46 candidate genes that the team then used to search gene and protein databases. In this initial screen, they found evidence of mRNA and protein expression for 27 of the genes. When the researchers did additional screening that included genes from earlier versions of the Ensembl database, meanwhile, they found another 33 genes that seem to have arisen de novo in the human genome.
"Each of these new genes has both transcriptional and proteomic evidences supporting their functionality," the study authors wrote.
By scrutinizing expression data available for the genes — including RNA sequencing data on 53 of the genes from past studies of 11 human tissues — they found that de novo gene candidates were most highly expressed in cerebral cortex and testes tissues, fueling speculation that the genes have had a role in the development of some human-specific traits.
In a perspectives article in PLoS Genetics, University of Dublin genetics researchers Daniele Guerzoni and Aoife McLysaght pointed out that the new study builds on previous research showing that subtle changes in DNA sequence can lead to phenotypic changes.
Still, the pair cautioned that the criteria used to define de novo genes is crucial. For example, they wrote, "care must also be taken to ensure that the ancestral sequence can reliably be inferred to be non-coding … Ideally, the putative non-coding sequences should be investigated for evidence of transcription and translation to support the inference of absence of coding capacity."
And though Guerzoni and McLysaght noted that a more complete catalog of de novo genes in human and primate genomes would be useful for understanding how these genes influence phenotype and more, they also argued that more extensive studies will be needed to determine the functional roles of de novo genes, if any.
"A major challenge remains to demonstrate functionality of the de novo genes," they wrote. "This is particularly difficult for human-specific genes, where there is perhaps the greatest interest, but there are also the greatest limitations in terms of possible experiments."