NEW YORK (GenomeWeb News) – A team of French and Italian researchers has developed a pipeline for identifying transcription factor binding sites using information from positional weight matrices (PWMs), genomic profiling, and expression data. Their work is scheduled to appear online this week in the Proceedings of the National Academy of Sciences.
The researchers created a PWM to computationally predict high-specificity binding sites. They then incorporated comparative genomic, gene expression and other data to focus in on the sites most likely to be functional. Using Stat3 as an example, they showed that they could pinpoint known and novel Stat3 binding sites in the mouse genome that are consistent with the transcription factor's known cellular roles and with Stat3 binding patterns as measured by chromatin immunoprecipitation.
"Given its high validation rate, and the availability of large transcription factor-dependent gene expression datasets obtained under diverse environmental conditions," the authors wrote, "our approach appears to be a valid alternative to high-throughput experimental assays for the discovery of novel direct targets of transcription factors."
Stat3, a member of the "signal transducers and activators of transcription," or STAT protein family, has previously been linked to processes such as inflammation, cellular proliferation, immunity, and oncogenesis. Following activation by cytokine receptors, growth factors, or oncogenes, Stat3 can induce a variety of target genes, depending on the cell type and cellular conditions.
To test their method and uncover new Stat3 binding sites in the mouse genome, senior author Valeria Poli, a molecular biologist at the University of Turin, and her colleagues developed a PWM using information on 54 characterized Stat3 binding sites. Along with information regarding background nucleotide frequencies, this PWM helped the researchers computationally predict Stat3 binding sites.
Using this approach, the researchers turned up nearly 1.4 million potential Stat3 binding sites in the mouse genome. They subsequently narrowed their search using comparative genomics to look for phylogenetic conservation between potential mouse Stat3 binding sites and those in seven other vertebrate species. They also compared these results with Stat3 binding sites identified in two cell lines in previous ChIP-Seq studies.
By further limiting their search to predicted binding sites within 10,000 bases upstream of transcription start sites, the team selected specific candidate sites for experimental validation.
Their results suggest that there is relatively little overlap between Stat3 binding sites in different biological systems. Of the 9,648 genes they found that had potential binding sites in at least one species, less than half — 4,339 genes — appeared to have conserved Stat3 binding sites in at least two of the species.
When they did functional analyses on these genes based on gene ontology terms, the team found over-represented genes involved in everything from development and transcription factor activity to intracellular signaling and cell-cell signaling, motility, and adhesion. Consistent with Stat3's role in oncogenesis, the researchers also noted that many of the genes linked to Stat3 had roles in tumor transformation, metastasis, and growth.
Overall, of the 14 newly-detected binding sites they tested, the team confirmed Stat3 binding at a dozen of the sites. And in general, proposed binding sites fell in genes whose expression varied in a Stat3 dependent manner, suggesting the computational strategy turned up real Stat3 targets in the mouse genome.
Compared with experimental methods such as ChIP-Seq, the authors noted, the approach "has the advantage of providing lists of [binding sites] independent of the cellular context." Along with the new Stat3 binding sites identified in this study, the team also pointed out that their method could be employed to identify targets for other transcription factors as well.
"The Stat3 targets identified in this work may represent previously unrecognized mediators of Stat3 pro-oncogenic functions," the authors concluded. "For the many [transcription factors] involved in pathological processes, our method can thus help understanding the molecular mechanisms underlying [transcription factor] physiological and pathological functions and identifying potential therapeutic targets among the regulated genes."