Researchers from the Wellcome Trust Sanger Institute and Manchester University have developed an open source software program that promises to seriously ramp up gene regulation research. NestedMICA is a new motif-finding program capable of handling megabase-scale pieces of sequence data.
Unlike previous methods of gene regulation research that required investigators to align lots of sequences to identify control regions, NestedMICA looks for the most common motifs in a single sequence. “I think this is potentially quite important because there’s some evidence around that suggest that some gene regulatory elements are evolving quite far and yet some are conserved, but you don’t necessarily have another genome with exactly the same motifs in exactly the same places,” says Thomas Down of Manchester University. “By purely looking at alignment, I think you run the risk of missing important stuff. So we just analyze the single genome.”
Earlier this year, the team loaded pieces of the Drosophila genome sequence next to each gene into NestedMICA. The program was able to identify 120 potential regulatory regions in the genome, most of which were confirmed using experimental imaging techniques. “Some of them are better supported than others,” Down says. “About a quarter of the motifs we can pretty confidently associate with a specific pattern of gene expression.” Out of the 120 motifs, 30 have been known to Drosophila investigators for some time. According to Down, most of the motifs are found near the start of genes where one would expect to find them, although not all are preferentially conserved.
“We hope that we’ll be able to use this map strategy to find essentially complete sets of all the regulatory motifs in a genome,” he says. “We hope that we can scale up this analysis to the point where we can find the vast majority of motifs in model organisms, like fruit fly and also in vertebrate, and the human genome.”
The program is currently available for download through the Sanger Institute website. “The program is out there and is there for anyone who’s interested in doing their own analysis,” Down says. “We’ve released our own motif set, and we hope we’ll be producing the same sort of motif annotation of the genome over the next couple of years.”