This article has been corrected to reflect that MM-ChIP powers integrative analysis of inter-platform and inter-laboratory data. An earlier version of this article stated that it powers the integration of inter-platform and intra-laboratory data. Genome Technology apologizes for the error.
Members of Shirley Liu's lab at the Dana-Farber Cancer Institute were characterizing the transcriptomics of the human estrogen receptor using ChIP-chip when they first decided to take action: "Several groups have published papers [on estrogen receptors] using different platforms, and some of them even on the same platform, and so we were just thinking: what would be the best way to combine all the data and come up with an integrative, or consensus, peak-calling from the different data sets?" Liu says.
To that end, Yiwen Chen, a postdoc in the Liu lab, designed MM-ChIP — a computational tool that accounts for inter-platform and inter-laboratory discrepancies based on a two-step normalization process. For ChIP-chip integration, MM-ChIP first works to remove the confounding effects of probe sequence intensity and genome copy-number, "which has been shown in other studies [to be] very effective to remove platform-systematic biases," Chen says. "In the second step, we combine window-based statistics from different data sets to get a confidence score using Stouffer's method" — an approach developed in 1949 by the late American sociologist Samuel Andrew Stouffer.
For ChIP-seq data integration, MM-ChIP first estimates library fragment variation specific to a particular data set. "Basically, you calculate the score for each data set and then add them together and calculate the statistical significance of this score," Chen says. Based on that, MM-ChIP shifts the tags generated by individual studies according to fragment size. From there, "it's just pooling the shifted tags together and using the window-scanning method across the genome to look at what other regions are enriched with [those] tags," Chen says.
Liu, Chen, and their colleagues show in a February Genome Biology paper that MM-ChIP outperforms previously established methods for ChIP data integration, which, taken on their own, could not "do the integrative analysis of [data from] different platforms," Liu says.
Overall, Chen says that as the numbers of ChIP-chip and ChIP-seq data sets rise, MM-ChIP offers researchers "an empowering opportunity — making new discoveries by pooling information across the different data sets." He envisions MM-ChIP as a first-step approach to overcome platform-specific biases, and hopes to be able to integrate additional data sets, such as those generated by RNA-seq, in the future.
The MM-ChIP package is available for download at the Liu lab Web site, as well as at Cistrome.org, an integrative Web server with more than 300 users that the team launched last year. As for MM-ChIP and Cistrome, the team welcomes user feedback. "We want to make it as useful as possible," Liu says.