NEW YORK (GenomeWeb News) – A team of researchers from the University of Washington has developed a technique for mapping regulatory protein occupancy across the genome, without prior knowledge of the specific proteins involved.
The approach, which they dubbed "digital genomic footprinting" relies on a combination of DNase I cleavage and high throughput, massively parallel sequencing. The team applied their genomic footprinting to the model organism Saccharomyces cerevisiae, or baker's yeast, identifying thousands of proposed protein binding regions in the genome.
The research, which appeared online yesterday in Nature Methods, suggests that digital genomic footprinting can be used — along with gene expression data and information from chromatin immunoprecipitation experiments — to ">characterize regulatory protein functions and binding patterns.
"[W]e coupled DNase I digestion of intact nuclei with massively parallel sequencing and computational analysis of cleavage patterns at single-nucleotide resolution to disclose the in vivo occupancy sites of DNA-binding proteins genome-wide," senior author John Stamatoyannopoulos, a genome sciences researcher at the University of Washington, and his colleagues wrote.
Regulatory proteins, such as transcriptional activators and repressors, help control gene expression in response to cellular and environmental cues. But mapping the positions of these proteins on the genome can be tricky.
Several decades ago, scientists discovered that treating cells with the enzyme DNase I could help locate such sites, since DNA sites occupied by proteins are shielded from DNase I-induced cleavage. Even so, Stamatoyannopoulos and his team noted, that approach is time consuming and hard to apply systematically.
On the other hand, they noted, more genome-wide approaches, such as chromatin immunoprecipitation combined with DNA sequencing or microarrays, can provide insights into regulatory protein occupancy across the genome but require prior knowledge of the proteins involved and may not provide fine-scale resolution of the nucleotides involved.
To overcome such problems, the researchers came up with a method that combines DNase I cleavage with massively parallel sequencing — an approach they demonstrated in yeast.
First, the team digested yeast nuclei with DNase I, isolated bits of DNA that were less than 300 base pairs long and had been cut by DNase I at both ends, and sequenced these by massively parallel sequencing using the Illumina Genome Analyzer I.
After they had filtered out telomeric regions, transposable elements, tRNA genes, rDNA genes, and so on, the researchers were left with nearly 24 million end reads that uniquely localized to the yeast genome.
By looking at how many times each nucleotide was cleaved by the enzyme, the team could pinpoint spots that were relatively resistant to cleavage and likely occupied by regulatory proteins.
They called these protected areas DNA-binding protein "footprints." A computer algorithm uncovered 4,384 such footprints in intergenic regions of the yeast genome using a five percent false discovery rate. Not surprisingly, such footprints tended to occur upstream of transcriptional start sites or in sites with known DNA-binding motifs.
"Whereas ChIP requires that each DNA-binding protein first be interrogated by genome-wide location analysis and can be carried out for only one protein at a time, DNase I footprinting addresses all factors simultaneously in their native state and detects regions of direct binding with nucleotide precision," the authors explained.
Interestingly, more than a third of the nearly 4,440 intergenic footprints in the yeast genome overlapped with protein-binding areas described in previous ChIP studies. For instance, the team found motifs recognized by yeast regulatory proteins such as Reb1, Abf1, and Hsp1.
In addition, their method helped them refine their view of some regulatory protein binding consensus sequences and learn more about how regulatory protein occupancy in the genome affects nearby gene expression.
Based on subsequent experiments, the team speculated that the footprint approach identified many real binding sites, but may have missed some authentic protein-DNA interaction sites that didn't meet their selection criteria. They suggested that additional, lower affinity binding sites could likely be uncovered through additional DNase I cleavage experiments.
"The resulting maps provided gene-by-gene views of transcription factor binding and related cis-regulatory phenomena at the resolution of individual factor binding sites," the authors wrote. "This degree of detail was sufficient to define regulatory factor binding motifs de novo, and to correlate factor occupancy patterns with higher-level features such as chromatin remodeling, gene expression, and chromatin modifications."
The researchers argued that the digital genomic footprinting approach could be applied to a variety of cellular conditions — for instance, to look at how regulatory protein occupancy in the genome changes with different growth conditions or cell cycle stages.
In addition, they noted, digital genomic footprinting should prove useful not only for expanding researchers' understanding of well-studied genomes but also as "powerful tool for annotation of the genomes of diverse organisms, about which little is known beyond the genome sequence itself."