NEW YORK (GenomeWeb News) – The three-dimensional structure of DNA can provide new clues about which regions of the genome are functional and evolutionarily conserved, according to a paper appearing in Science Express today.
A team of researchers from Boston University, the National Center for Biotechnology Information, and the National Human Genome Research Institute developed an algorithm for analyzing and comparing a DNA molecule's three-dimensional structure based on data from chemical experiments. By comparing DNA structure between species and looking at how phenotype-related SNPs alter structure, the team linked structure with function.
Their results suggest that some 12 percent of bases in the human genome are under evolutionary constraint — twice as many as predicted by sequence alone.
"This new approach is an exciting advance that will speed our efforts to identify functional elements in the genome, which is one of the major challenges facing genomic researchers today," NHGRI Scientific Director Eric Green, who was not involved in the study, said in a statement. "Coupled with continued innovation in DNA sequencing, this topography-informed approach will expand our ongoing efforts to use genomic information to improve human health."
Predicting the structure of a DNA molecule based on sequence is tricky, since similar DNA sequences can have different three-dimensional structures while different sequences can have comparable structures.
For this study, the researchers developed an algorithm based on three-dimensional structures deciphered from the hydroxyl radical cleavage pattern of DNA. Since hydroxyl radicals are extremely reactive free radicals, they can pluck hydrogen from DNA and use it to create water, cutting the DNA backbone in the process, co-senior author Thomas Tullius, a Boston University chemist, told GenomeWeb Daily News.
Looking at the cleavage patterns indicates which parts of the DNA backbone are solvent accessible, revealing the three-dimensional shape of the DNA.
The researchers had the chance to map DNA structure across large swaths of the genome using data from the Encyclopedia of DNA Elements, or ENCODE, project — a public research consortium spearheaded and funded by NHGRI to identify functional elements in the human genome.
To bring together information on DNA structure and sequence and compare structural profiles between and within species, Tullius and his colleagues developed a computer program called Chai. The algorithm is somewhat analogous to the binomial conservation, or binCons, algorithm, the team noted, though binCons evaluates only primary nucleotide sequence data rather than looking at structural patterns in the genome.
Tullius credits Boston University bioinformatics graduate student Stephen Parker with coming up with the new algorithm, which he said can be used to generate a picture of the structural profile for the whole genome.
"We brought together two diverse fields to think about this problem in a new way," co-senior author Elliott Margulies, an investigator at the National Human Genome Research Institute's Genome Technology Branch, said in a statement. "It took the combined expertise of a DNA chemist and computational biologist to figure out that this chemical technique could advance our understanding of comparative genomics."
When the researchers ran Chai and binCons on ENCODE pilot project regions — evaluating 30 million bases of DNA from three dozen different species — they found that Chai consistently identified more evolutionarily constrained bases then the sequence-based binCons algorithm.
Their results suggest that 12 percent of the bases in the human genome are subject to evolutionary constraint — up from the six percent identified by sequence data alone.
By examining DNase I hyper-sensitive sites and predicted enhancer regions, the team looked at how many of the regions identified by each algorithm contained functional elements. Again, Chai seemed to pull out functional sequences that binCons missed.
"Focusing our analysis on regions identified only by Chai and not by binCons (Chai-only regions) resulted in a statistically significant overrepresentation of non-coding functional sequences ... in the Chai-only regions and a statistically significant under-representation of coding regions," the authors explained.
And when the team used Chai and binCons to assess a dozen genome regions expected to contain enhancers, they found that seven of the regions could be identified by both the Chai and binCons algorithms. But five were only detected by Chai.
Consistent with the notion that DNA structure in non-coding regions can alter biological function, the researchers found dramatic structural differences between high- and low- affinity binding sites for the mammalian transcription factor Zif268 and the archaeal transcription regulator Ss-LrpB.
Next, the researchers looked at whether phenotype-related SNPs in non-coding regions also altered DNA structure. Indeed, when they evaluated 734 SNPs in non-coding regions of the human genome that were linked to phenotypes in the Phenotypes for ENCODE, or PhenCode, database, the team found that phenotype-related SNPs tended to produce larger structural changes in the genome than neutral variations.
In the future, Tullius said he would like to see this approach extended to look at the entire human genome — building on the one percent or so of the genome they've evaluated so far. He is also interested in learning more about the link between DNA structure and phenotype and/or disease.
In addition, Tullius emphasized, the findings suggest researchers need to shift away from thinking of DNA as just a sequence of letters, since taking a three-dimensional view of the molecule provides new functional and evolutionary information.