NEW YORK (GenomeWeb) – By mapping protein-coding parts of the human genome that are relatively impervious to change, a team from the University of Utah and the University of Colorado has identified mutations known for contributing to developmental disorders as well as new candidate genes for these and other conditions.
"The map we created will provide the community with a resource to study genes that heretofore had no disease association," corresponding author Aaron Quinlan, a human genetics researcher at the University of Utah, said in a statement. "The beauty and power of this approach is that, as we obtain more data from ever more human genomes, we can continue to improve the resolution of this map to pinpoint areas to study for disease.
As they reported online today in Nature Genetics, Quinlan and his colleagues used available exome sequence data for 123,136 individuals included in the Genome Aggregation Database (gnomAD) to identify so-called constrained coding regions (CCRs) in the human genome — sites that appeared particularly apt to contain pathogenic variants found in the National Center for Biotechnology Information's ClinVar database or in past developmental disorder studies.
"We demonstrate that the most constrained regions recover known disease loci, assist in the prioritization of de novo mutation, and illuminate new genes that may underlie previously unknown disease phenotypes," the authors wrote.
For their CCR mapping endeavor, the researchers considered almost 4.8 million missense or loss-of-function variants in 123,136 gnomAD exomes. Whereas one variant turned up roughly every seven coding bases, on average, they focused on stretches of coding sequence where such variants were far less common.
The team noted that variants classified as known pathogenic or likely pathogenic in ClinVar were over-represented in the resulting CCR map, even though just 1,415 genes housed at least one of the CCRs falling in the 99th percentile or higher. These included genes such as SCN1A, CACNA1A, or SMARCA2 that have been implicated in developmental delay, seizures, congenital heart conditions, and other diseases.
Likewise, the group's analysis of de novo missense mutations previously reported in thousands of children with neurodevelopmental disorder suggested that the highly constrained regions in the human genome were more than seven times as likely to contain such pathogenic de novo mutations.
Even so, the researchers found that many of the genes harboring prominent CCR sites did not raise red flags in ClinVar: more than 2,200 CCRs in the top 99th percentile and nearly 20,700 CCRs in the top 95th percentile were not classified as pathogenic in the database.
Those variants were especially common in essential genes, they reported, hinting that the CCR map may assist in variant classification and lead to new disease risk genes or genes that lead to embryonic lethality when mutated. Still, the authors cautioned that the approach may miss some authentic disease-related variants in variant-depleted regions that don't reach the CCR threshold.
"Looking forward, we argue that the most useful outcome of detailed maps of coding constraint is the ability to highlight critical regions in genes that have not yet been linked to human disease phenotypes," the authors wrote, adding that "[i]nvestigating the phenotypic effects of disrupting these regions provides an opportunity to identify new coding regions that drive disease phenotypes and are vital to human fitness."