In a paper published online in advance in Genome Research this week, the European Bioinformatics Institute's Ewan Birney and his colleagues propose an approach for the efficient storage of re-sequencing data, based on compressing reference sequence information. "We align new sequences to a reference genome and then encode the differences between the new sequence and the reference genome for storage," Birney et al. write, adding that with this compression method they've seen "exponential efficiency gains as read lengths increase."
In Genome Research this week, Princeton University's David Stern and his colleagues describe a genotyping approach "based on multiplexed shotgun sequencing that can identify recombination breakpoints in a large number of individuals simultaneously at a resolution sufficient for most mapping purposes." Stern's team outlines its library construction protocol and its "Hidden Markov Model to estimate ancestry at all genomic locations in all individuals using these data." In "mapping more than 400 previously unassembled D. simulans contigs to linkage groups," the researchers found that their method "allows estimation of recombination breakpoints to a median of 38 kb intervals."
Investigators at the University of Rennes in France, and their colleagues at the Dana-Farber Cancer Institute and Dartmouth Medical School, report an epigenetic switch in a neural differentiation model that involves the "induction of FOXA1 expression and its subsequent recruitment to enhancers is associated with DNA de-methylation." At the same time, the authors write, histone H3 lysine 4 methylation is also induced at these enhancers, such that both epigenetic changes could work concomitantly to "stabilize FOXA1 binding and allow for subsequent recruitment of transcriptional regulatory effectors."
And in another Genome Research paper published online in advance, an international research team led by investigators at Vanderbilt University presents "a spatial and temporal map of C. elegans gene expression." The team tagged mRNAs in a 30 cell types that represent each developmental stage and used tiling arrays to generate gene expression profiles for each. Using a machine learning-based approach, the team found "transcripts corresponding to established gene models and revealed novel transcriptionally active regions in non-coding domains that comprise at least 10 percent of the total C. elegans genome," and overall, that "about 75 percent of transcripts … are differentially expressed among developmental stages and across cell types."