NEW YORK (GenomeWeb News) – In a paper appearing online last night in the Proceedings of the National Academy of Sciences, researchers from Germany, Croatia, and Norway report that they have come up with a bioinformatics-based approach for predicting gene expression levels in human cells based on histone modification levels.
The team developed their quantitative models by bringing together data on gene expression and histone modification levels in human CD4+ T-cells. They found that histone modification levels corresponded to expression levels — but only a subset of histone data were needed for predicting gene expression. And, the researchers reported, different sets of histones provided information about genes with high- or low-CpG promoters.
"We have shown that the levels of histone modifications at a promoter proximal region are well correlated to the expression of genes," senior author Martin Vingron, head of the Max Planck Institute for Molecular Genetics' computational molecular biology department, and his co-authors wrote.
Past research suggests modifications affecting histones can influence everything from DNA replication and repair to transcription, the researchers noted. "[T]here are established links between the distinct steps in the transcription cycle and some histone modifications," they wrote. "However, in general, little is known about the relationship between histone modifications and the transcriptional process."
Still, they suspected they might be able to gauge gene expression by looking at modifications affecting these histones. To explore this further, Vingron and his co-workers assessed data on 38 different histone modifications and a one histone variant from published ChIP-seq studies of CD4+ T-cells.
Specifically, the researchers looked at the number of histone modification tags within a few thousand bases of transcription start sites for nearly 15,000 RefSeq genes, combining this information with microarray data showing transcript levels for these genes. From there, the team came up with quantitative models for predicting expression from the histone modifications.
Using these models, they found that by looking at the levels of just a few histone modifications at a gene's promoter, they could get insights into the expression of the gene.
"[C]ombinations of only two or three modifications are sufficient to build models that give rise to at least 95 [percent] of the performance obtained by using all modifications," the researchers wrote.
In particular, the researchers noted, four modifications — H4K20me1, H3K27ac, H3K79me1, and H2BK5ac — turned up in many of the models and seem to be closely linked to expression.
The team also detected differences in the types of modifications associated with specific types of promoters. For instance, they noted that H3K4me3 and H3K79me1 modifications were most useful for predicting the expression genes with low CpG promoters, while H3K27ac and H4K20me1 levels were most informative in terms of predicting the expression of genes with high CpG content promoters.
And when they tested models involving nine histone modifications in two other human cell types, the researchers found that they could accurately predict gene expression in these cells as well.
The team emphasized that it's still unclear whether the histone modifications they measured are actually altering transcription or whether a gene's transcriptional status influences its histone modifications. Even so, they noted, the relationship between the processes hints at a relationship between the RNA polymerase II enzyme and histone modifications — and exploiting this association should aid future studies involving these processes.
"[W]e can pinpoint a small number of modifications whose levels at the promoter can be used to infer gene expression and hence provide some information about the transcriptional process," they concluded, "which reduces the experimental effort to study the relationship between histone modifications and transcription."