Many sequence analysis tools currently available don't adequately account for both the proximal and distal binding sites of cis-regulatory genomic regions, but, according to Gill Bejerano, they're not to blame. "These tools are perfect for what they were written for," — namely, microarray analysis — the assistant professor at Stanford University says. With the increasing emergence of next-generation sequencing data sets, researchers need an analytical tool "that really models next-generation sequencing data for what it really is," Bejerano says. He and his team have come up with a GREAT solution — the Genomic Regions Enrichment of Annotations Tool, that is.
The team behind the GREAT algorithm is poised to help researchers extract the most information from their data sets by using a comprehensive, statistically validated outlook. "Roughly 50 percent ... of the binding sites people are just excluding right now," Bejerano says. "There's a lot of information out there that, [in] the experiment you spend so much time and energy on, is completely getting discarded right now."
Central to GREAT's novelty is a unique statistical test that interrogates a given set of input genomic regions and employs extensive ontologies of gene annotations to compute enrichments using a binomial test that is aware of the variability in gene regulatory domain size. The test "actually penalizes larger regulatory domains, so there's an inherent healthy balance between assigning more and more of the genome to any given gene," Bejerano says.
What's more is that GREAT is able to pinpoint how specific transcription factors plug into pathways and functions, he says. While GREAT is most readily useful for analyzing ChIP-seq data, Bejerano says the algorithm is applicable to any "genomic [region] that you think is enriched for cis-regulatory action ... as long as the events themselves are actually localized," and is therefore useful for certain epigenetic modifications and comparative genomics analyses.