NEW YORK (GenomeWeb) – Columbia University researchers have developed a new approach to uncover stretches of the non-coding human genome that are more likely to harbor pathogenic variants.
While scouring the human exome has tied a number of variants to genetic conditions, variation within the non-coding portion of the genome also contributes to disease, according to Columbia's David Goldstein and his colleagues.
The team developed a tool, named Orion, that detects parts of the non-coding genome that are intolerant to variation and subject to purifying selection. Variants that do crop up in these regions are more likely to then contribute to disease. As they reported in PLOS One today, the researchers found that regions Orion predicted to be more intolerant to change coincided with locations of known non-coding pathogenic variations.
"At that point, we are optimistic that Orion will constitute one helpful tool in the effort to identify variants throughout the genome that influence the risk of both rare and common diseases," Goldstein said in a statement.
The researchers' Orion tool scans the genome for regions that harbor lower-than-expected levels of variation within the human population. By gauging the difference between the expected level of variation and what's actually observed, the tool quantifies the level of intolerance to variation.
The researchers applied this approach to a set of 1,662 samples that had undergone whole-genome sequencing. Using a sliding window approach, they calculated an intolerance score for each 501 base pair-large window across all autosomal chromosomes. By examining 100,000 random Orion scores from across the genome, the researchers found that exons, ultra-conserved regions, and DNase hypersensitivity sites all had higher Orion scores, reflecting an increased intolerance to variation.
The researchers also applied the Orion approach to the exons of 1,000 randomly selected genes and 1,000 randomly selected, size-matched stretches of non-coding sequence. As they expected, it found the exons to be more intolerant of change than the non-coding sequences.
To test whether Orion could predict pathogenic variation, the researchers developed two sets of variants from the ClinVar database. One set consisting of some 5,000 variants was deemed to be non-coding and benign, while the other set of 223 variants was non-coding but pathogenic.
After excluding common variants from the benign ClinVar set, the researchers found that only three of them fell within Orion regions, while 11 — 14 percent — of the pathogenic variants did. This suggested to the researchers that their Orion approach could capture pathogenic mutations.
Goldstein and his colleagues applied this approach to two cohorts — one of people with autism and one of people with epileptic encephalopathies — who'd undergone exome sequencing. While they didn't uncover an increase in de novo mutations within Orion regions among the cases in the autism cohort, they did in the epilepsy cohort. This, they said, illustrates that de novo mutations are preferentially drawn from Orion intolerant regions.
Based on their findings, the researchers said Orion could be used to prioritize variants found in patients thought to have genetic conditions. They argued that Orion scores would improve the ability to interpret patients' whole-genome sequencing data.
"We anticipate that researchers will immediately start using Orion to help them find pathogenic mutations in patients in which previous sequencing efforts were negative," Goldstein added.