NEW YORK (GenomeWeb) – People harbor a number of transcription factor binding site variants, and a new study from a Stanford University team has found that the genes affected by those variants have functions that correlate with those people's medical histories.
Researchers led by Stanford's Gill Bejerano scoured the publicly available genomes and personal medical histories of five people for variants that affect ancestral transcription factor binding sites. As the researchers reported today in PLOS Computational Biology, the genes nearest to these affected transcription factor binding sites are involved in processes linked to those individuals' medical histories. This, they added, suggests that mutational load can lead to alterations in gene regulation and heritable phenotypes.
"The beauty of having whole genomes available for study is that you can then ask completely agnostic questions," Bejerano, an associate professor at Stanford, said in a statement. "We set out to find hidden layers of susceptibility in the regulatory regions of these genomes. We were very pleased that our analysis gave such clear and significant associations between the mutations and medical histories."
Rather than searching across all transcription factor-binding sites, Bejerano and his team homed in on ones where variants would likely be deleterious — ones that are highly conserved throughout evolution.
Using a library of 657 different transcription factors, the researchers predicted cross-species conserved binding sites within the human reference genome, which they then compared to variants found in one individual to identify ones that overlapped these predicted conserved binding sites. They then selected the variants at spots where the human reference base is the same as the chimpanzee ortholog — indicating it's likely the ancestral state.
Using a software program they previously devised called PRISM, the researchers predicted whether those nucleotide changes would likely disrupt transcription factor binding. They then kept for analysis the binding sites where the individual or derived variant is predicted to decrease binding affinity, as compared to the ancestral one.
They dubbed these sites conserved binding site eroding loci, or CoBELs.
Bejerano and his colleagues generated a list of CoBELs for five individuals — Stephen Quake, George Church, Misha Angrist, Rosalynn Gill, and James Lupski — whose genomic and medical data is publicly available.
For each of those five people, Bejerano and his team examined whether their CoBELs were preferentially situated near genes with certain functions. With the Genomic Regions Enrichment of Annotations Tool (GREAT) they developed, the researchers gauged whether these changes would affect certain groups of nearby genes. From this, they found that the CoBELs from each of these individuals had enrichments for different functions.
Further, these enriched functions tracked with the individual's medical histories, the researchers reported.
For instance, of Quake's 6,321 CoBELs, 57 were located in the regulatory domains of genes linked to abnormal cardiac output. That's twice as many as would be expected to be there by chance, the researchers noted. Quake's medical history, they added, is marked by a family history of arrhythmogenic right ventricular dysplasia or cardiomyopathy and a possible instance of sudden cardiac death.
Similarly, Church, who has narcolepsy, has an enrichment in his CoBELS for preganglionic parasympathetic nervous system development, and the researchers noted that the autonomic nervous system is thought to have a role in narcolepsy. At the same time, Gill, who is hypertensive, has an enrichment in her CoBELs for decreased circulating sodium level.
Further, Bejerano and his colleagues found that these enrichments, while present in these five people individually, were rare within the 1,000 Genomes Project cohort.
This, they said, suggests that small deleterious mutations in gene regulation that crop up over generations can be expressed as familial disease phenotypes.
"We are the sum of billions of transcription-factor-binding events in thousands of cell types throughout our bodies," Bejerano said. "Not every disease will be amenable to this type of analysis. But this study shows that nature, even the noncoding genome, can be very benevolent when you ask the right questions. And it may help us begin to combine our knowledge about variations, or mutations, that occur throughout the genome. "
According to Stanford, Bejerano and his co-author Harendra Guturu, also at Stanford, have filed a patent application on the algorithm they used in this study.