Researchers at the University of South Wales have developed a new freely available web-based tool called CisExpress for identifying gene regulatory motifs and providing useful information about possible molecular mechanisms that govern co-regulation in genes.
According to its developers, CisExpress, which is described in a recently published em>Bioinformatics paper, was designed to provide researchers with a single, robust method for identifying sequence motifs that is more effective than using multiple tools to get the most accurate results. The statistical tool detects sequence motifs in whole genome and transcriptome datasets by looking for "similarities between the promoters of similarly expressed genes."
It is an improved version of an existing tool called Motifer — a program for identifying promoter motifs that was developed in 2009 by researchers from agricultural firm Ceres and Loyola Marymount University. Denis Murphy, the head of the University of South Wales' genomics and computational biology research group and a co-author on the Bioinformatics paper, told BioInform that, among other changes, his team made modifications that enabled their version of Motifer to process large datasets.
These alterations include a farming algorithm that CisExpress uses to "exploit" multiple computational cores provided by the High Performance Computing Wales network — a supercomputing facility that is supported by the Welsh Government and several academic institutions — which provides the underlying compute power that the tool runs on.
CisExpress also incorporates a few "mathematical assumptions," Murphy said. These are that "the function of promoter motifs is position-specific" and that "microarray data provide reasonable measurements of transcript abundance and reflect promoter activity," according to the paper.
CisExpress locates DNA motifs in two steps. In the first, which it borrows from its parent program, the tool detects so-called "seed motifs" — this is done to determine "the consensus sequences of motifs and their approximate position in the promoter region," the paper explains. In the second step, it optimizes the motifs using a genetic algorithm, which "determines the best possible motif model and motif position" — this part of the process is unique to CisExpress, according to the team.
The resulting program, the researchers claim, is more accurate than existing tools like MatrixReduce — an algorithm that’s used to discover the sequence-specific binding affinities of transcription factors. In their paper, Murphy et al report that in comparison tests between the two programs using 11 gene expression datasets from Arabidopsis thaliana, CisExpress provided more specific position weight matrices and lower p-values than its competition. This is due, they wrote, "to its use of motif position information and the addition of the GA optimization step to tune PWMs."
Currently, CisExpress contains motif data from A. thaliana and from the palm oil plant although the developers plan to bring in additional data from other plant and non-plant species at a later data. They also plan to add in an algorithm for predicting promoters later this year, Murphy said.
Furthermore, the researchers have launched a company called MyRegulome to provide commercial services based on CisExpress to firms in the medical and agricultural markets, Murphy told BioInform.
They plan to start offering services early next year, he said.