NEW YORK – An international research team has tracked down a slew of previously unappreciated regulatory elements in primates with an approach focused on finding constrained sequence elements (CSEs) across hundreds of primate genome assemblies.
"We discover hundreds of thousands of regulatory elements that emerged very recently in evolution and are specific to primates and humans, and are not present in other mammals," Kyle Kai-How Farh, VP of artificial intelligence at Illumina, said in an email.
As they reported in Nature on Wednesday, Farh and colleagues from the Illumina Artificial Intelligence Laboratory, Baylor College of Medicine, Pompeu Fabra University, and other international centers characterized sequences that are under constraint in primates with the help of an alignment that encompassed genome sequences for 239 primate species, including new genome assemblies for 187 species.
With this approach, the team flagged 111,318 DNase I hypersensitivity sites and more than 267,400 evolutionarily constrained transcription factor binding sites, digging into the constrained regulatory sequence elements that distinguish humans and other primates from the broader mammalian group.
The primate constrained cis-regulatory elements (CREs), which the authors called "unique evolutionary records that provide a lens through which to view the mechanisms of recent exaptations [adaptations that alter the initial function of a sequence] leading to our species," turned up at sites in the human genome that were missed by previous analyses focused on all mammals.
"In keeping with prior work showing that noncoding DNA evolves more rapidly than protein-coding sequences, we find that many human CREs that previously showed no evidence of sequence constraint are in fact constrained exclusively in primates," the authors reported, "considerably expanding the number of known constrained noncoding elements in the human genome."
In particular, the team noted that genetic risk variants implicated in complex traits or common diseases in humans in prior genome-wide association studies were overrepresented at sites that appeared evolutionarily constrained in the analysis, while expression quantitative loci found in the GTEx study tended to turn up at regulatory element sites that became constrained relatively recently in primates.
"The study improves our understanding of disease variants in the noncoding genome," Farh said. "Noncoding DNA evolves much more quickly than protein-coding sequence, and we find that these newly evolved regulatory sequences are strongly enriched for genetic variants underlying human common diseases."
Given the results of the study, Illumina reportedly plans to make genome-wide conservation scores available to customers through the Illumina Dragen, Emedgene, Connected Insights, and Illumina Connected Analytics tools.
Moreover, the CSE collection is expected to continue to grow and be refined as still other primate genomes are sequenced, assembled, and analyzed in the future.
"Additional sequencing of the remaining species in the primate order, including population-level oversampling of key lineages, would help to provide the resolution needed to detect sequence elements under selective constraint in finer detail, especially those specific to clades from which the human species ultimately emerged," the authors concluded.