Yale University's Mark Gerstein and his co-authors from the US, UK, Spain, and Switzerland take on pseudogenes in a Genome Biology paper produced as part of the international GENCODE project. Using a combination of manual curation, in silico annotation pipelines, and RT-PCR-sequencing validation experiments, Gerstein and the group defined a set of pseudogenes in the genome. Folding in ENCODE functional data made it possible to assess pseudogene expression levels, and the transcription factors, chromatin marks, and RNA polymerase binding patterns associated with them, while 1000 Genomes Project information served to help track down pseudogenes under selection. "At one extreme," they report, "some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs."
In an ENCODE project paper, another international team led by Yale's Gerstein describes its statistical approach for classifying regions of the genome associated with human transcription factors characterized in other branches of ENCODE. The team brought together data for more than 100 known or suspected transcription factors studies in various human cell types and used this information to define three types of regions with distinct transcription factor binding patterns, chromosomal and chromatin patterns, and cell type specificity. "Our machine learning approach enables us to identify features potentially general to all transcription factors," it notes, "including those not included in the data."
Still another of the ENCODE studies appearing in Genome Biology looks at ways of using chromatin features to predict gene expression patterns. An international team led by investigators at the University of Massachusetts came up with a two-step model for relating chromatin profiles and gene expression to one another in diverse cellular situations, using information on the expression of genes described by GENCODE and chromatin features identified through ENCODE. "Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels," University of Massachusetts' Zhiping Weng, the study's senior author, and colleagues write. "We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy."