NEW YORK – Researchers at the Wellcome Sanger Institute, Radboud University Medical Center in the Netherlands, Opko Health's GeneDx, and their collaborators have identified 285 genes that are significantly associated with developmental disorders, 28 of which have not previously been robustly associated with these conditions.
In a paper published on Wednesday in Nature, the researchers described their efforts to identify previously undescribed genes associated with developmental disorders by integrating healthcare and exome sequencing data from 31,058 parent-offspring trios of individuals with developmental disorders and developing a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. Through this, they identified the 285-gene dataset.
However, they also noted that although they were able to detect genes associated with developmental disorders that had not been identified up to this point, many of the excess de novo mutations in protein-coding genes remain unaccounted for. Modeling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes, the researchers said.
"This study has really shown the benefits of access to healthcare data, not least to the approximately 500 families living with a developmental disorder who had not been able to get a diagnosis until now," co-lead author and Wellcome Sanger Institute researcher Matthew Hurles said in a statement. "But our findings also estimate that we require ten times as much data to be able to identify all the genes linked to developmental disorders. As such, greater access to anonymized patient data is crucial to our understanding of these conditions and our ability to help the families living with them."
The researchers pooled data on de novo mutations from patients with a developmental disorder from GeneDx, the Deciphering Developmental Disorders study, and Radboud University Medical Center. The mutations included 40,992 single nucleotide variants and 4,229 insertions or deletions. To detect gene-specific enrichments of damaging mutations, they developed a method named DeNovoWEST (de novo weighted enrichment simulation test) to score all classes of sequence variants on a unified severity scale. When they applied DeNovoWEST to all individuals in the cohort, the researchers identified 281 significantly enriched genes.
The majority (196 out of 281) of these genes already had sufficient evidence of an association with developmental disorders to be considered of diagnostic utility as of late 2019 by all three centers — these were called consensus genes. Another 54 out of the 281 were previously considered diagnostic by one or two centers — these were termed discordant genes.
To discover novel disorder-associated genes with greater power, the researchers then applied DeNovoWEST to mutations in patients without damaging mutations in consensus genes and identified 94 significant genes, of which 33 were putatively novel and associated to developmental disorders. Further refining analyses excluded five of these genes, leaving 28 novel genes, with a median of 10 non-synonymous de novo mutations, the researchers said.
They also investigated whether some of the synonymous mutations might be pathogenic by disrupting splicing, and their experiments found that 25 percent of the patient cohort had a non-synonymous mutation in one of the consensus or significant developmental disorder-associated genes. They noted significant sex differences in the autosomal burden of non-synonymous mutations, finding that their rate was significantly higher in female than male individuals. However, the exome-wide burden of autosomal non-synonymous mutations in all genes was not significantly different between undiagnosed male and female participants. This indicated that there are subtle sex differences in the genetic architecture of developmental disorders, especially with regard to known and undescribed disorders.
"Overall, novel [developmental disorder]-associated genes encode proteins that have very similar functional and evolutionary properties to consensus genes," the authors wrote. "Despite the high-level functional similarity between known and novel [developmental disorder]-associated genes, non-synonymous [de novo mutations] in the more recently described [developmental disorder]-associated genes are much more likely to be missense [de novo mutations], and less likely to be [protein truncating variants]."
Importantly, the researchers observed a significant overlap of 70 genes between their set of 285 developmental disorder-associated genes and a set of 369 previously described cancer-driving genes. By modeling the germline mutation rate of the somatic driver mutations they observed, they found that recurrent non-synonymous mutations in The Cancer Genome Atlas were enriched 21-fold in their cohort. Their data suggested that these findings were driven by the pleiotropic effects of these mutations in development and tumorigenesis, rather than because of the hypermutability of these variants, the researchers said.