NEW YORK (GenomeWeb) – A team from Princeton University, Rockefeller University, and elsewhere has developed a computational approach for untangling the contributions of noncoding variants in autism spectrum disorder (ASD).
"Our predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritized mutations with high impact for further study," the authors wrote, noting that their general approach may be applicable to other complex conditions. The work was led by Olga Troyanskaya, deputy director for genomics at the Flatiron Institute's Center for Computational Biology and a professor of computer science at Princeton University, and Robert Darnell, professor of cancer biology at Rockefeller University.
As they reported online yesterday in Nature Genetics, the researchers used a deep-learning strategy to assess noncoding variants at thousands of sites in the genome using whole-genome sequencing data from individuals in almost 1,800 ASD families with one affected child. Along with pathways with predicted regulatory shifts in ASD in certain tissue types, they saw enhanced noncoding mutation representation in parts of the genome containing transcriptional regulators or RNA-binding protein regulators.
The team noted that while recent studies have unearthed a wealth of de novo copy number variants and point mutations implicated in ASD, the variants falling in protein-coding portions of the genome seem to explain less than one-third of the heritability observed in simplex ASD, leaving more work to do in untangling the consequences of the inherited or de novo noncoding variants associated with ASD.
In an effort to come up with a noncoding variant analysis approach that takes into account factors such as variable noncoding variant effect sizes and other potentially confounding factors, the study authors developed a "deep convolutional-neural-network-based" strategy for finding and predicting potential regulatory consequences of noncoding variants in nearly 7,100 genome sequences from 1,790 ASD simplex family members.
The approach involves comparing noncoding mutational profiles in ASD-affected and -unaffected siblings in a manner "analogous to using the genetic codon code to distinguish non-synonymous mutation from synonymous mutations in protein-coding genes," the researchers explained, while incorporating information on related histone marks, chromatin accessibility features, transcription factor binding sites, and DNA- or RNA-binding protein targets identified through past efforts such as ENCODE or the Roadmap Epigenomics project.
"Our framework estimates, with single-nucleotide resolution, the quantitative impact of each variant on 2,002 specific transcriptional and 232 specific post-transcriptional regulatory features," they wrote, "including histone marks, transcription factors, and [RNA-binding protein] profiles."
Using this strategy, the researchers saw a significant uptick in de novo noncoding variants with predicted functional affects in the children with ASD compared with their unaffected siblings, including alterations predicted to have pronounced functional impacts near genes known for loss-of-function intolerance. Their tissue-focused analyses — using data for dozens of tissue or cell types assessed for the Genotype-Tissue Expression project — pointed to an overrepresentation of variants with regulatory roles in brain tissue.
Along with follow-up analyses focused on the functions of noncoding variants with potential ties to ASD and the burden of specific de novo noncoding variants in individuals with ASD, the team used a network-neighborhood differential enrichment analysis to look for frequently affected pathways or biological processes.
The latter analysis highlighted that networks containing genes expected to have strong ties to disease, genes involved in synaptic transmission or neurogenesis, and chromatin regulation-related genes, among others, tended to be more frequently impacted by noncoding alterations in the children with ASD than in their siblings.
Based on these and other findings, the authors suggested that "coding and noncoding mutations affect overlapping processes and pathways, which indicates a convergent genetic landscape and highlights the potential for the discovery of ASD-associated genes with combining coding and noncoding mutations."
In addition to developing an interactive web site for those interested in digging into their data, the authors suggested that a similar deep learning method may be applicable to other complex conditions with noncoding variant contributions.
"Now we open the field to understand all the factors that may be involved in autism," co-first author Chandra Theesfeld, a research scientist in Troyanskaya's integrative genomics lab at Princeton, said in a statement.