NEW YORK (GenomeWeb) – A trio of papers released this week as part of the International Human Epigenome Consortium described new tools for handling epigenetic and epigenomic data.
A team led by researchers at the Spanish National Cancer Research Center noted in its paper that "[t]he impact of large and complex epigenomic datasets on biological insights or clinical applications is limited by the lack of accessibility by easy, intuitive, and fast tools."
Two papers, both published in Cell Systems, detailed new data portals to make epigenetic data more easily accessible. In one, a Canadian team describes the IHEC portal, which provides access to data from seven international consortia, while in the other, the Spanish team outlined its Blueprint Data Analysis Portal, which provides an interface to compare the consortium's hematopoietic epigenomic data.
At the same time, an international team of researchers presented in Cell Reports its tool for the analysis and interpretation of epigenome-wide association study data.
These papers are part of a collection of more than 40 articles from the IHEC. This assemblage includes dozen of studies published this week in Cell Press and other journals as well as ones that came out earlier in the year. A number of the papers touch on the role of epigenetics in development and immunity as well as in conditions like autism and cancer.
As they described in their paper, researchers from McGill University and the University of Sherbrooke developed the IHEC Data Portal in order to integrate datasets from across various research consortia. Their portal provides access to data from seven international consortia — ENCODE, NIH Roadmap, CEEHRC, Blueprint, DEEP, AMED-CREST, and KNIH — that includes more than 7,000 epigenomic reference sets from more than 600 different tissues.
"The IHEC Data Portal is being built as a comprehensive discovery tool to enable the research community to share epigenomic data and collaborate more effectively," the researchers wrote.
In particular, the portal depends on the IHEC Data Hub JSON documents for the retrieval and distribution of the consortium data, and through an online API, its users can select and navigate through the various datasets. Users can then further explore the data with the UCSC Genome Browser. Through this, the Canadian team said that users could rely on a correlation tool to compare selected datasets. Data can also be downloaded, though raw data requires a data access request, the researchers noted. At the same time, users can share datasets they've selected and filter using session ID and URLs.
The researchers added that their strategy could also be adapted to integrate data generated by other consortia.
At the same time, a Spanish team developed its own portal for the analysis of data from the Blueprint Consortium. That consortium has generated reference epigenomes for hematopoietic cell lineages, and its dataset includes ChIP-seq, DNAsel-seq, whole-genome bisulfite sequencing, and RNA-seq data and covers more than 60 cell types.
For the portal, the researchers turned to the epigenomics comparative cyber-infrastructure (EPICO) platform, which they said has five parts: a data model; data validation and loading programs; an empty database in which to store data and metadata from the data validation and loading programs; the API; and the data analysis portal itself. In addition to EDICO, this approach also requires storage space to create the database, a connection to fetch the primary data, and modules to receive queries and send results.
Altogether, the researcher said their portal, dubbed BDAP, allows users with little background in bioinformatics to visualize and compare epigenomic and transcriptomic data for blood cell types they are interested in. In particular, they tested their portal using two genes, FPR1 and IRF8. For both, the portal was able to highlight known gene expression changes that occur in blood cells to those genes and relate that to concurrent epigenetic changes.
The Spanish team further said that EDICO could serve as a standard template to enable researchers to explore epigenomic data.
And to dive into that data more deeply, researchers led by University College London's Stephan Beck developed a new tool, called eFORGE, to allow users to sift through data from epigenome-wide association studies. In that way, they could to uncover disease-relevant cell types, as Beck and his colleagues noted in Cell Reports this week.
The tool — eFORGE stands for experimentally derived Functional element Overlap analysis of ReGions from EWAS — gauges which differentially methylated positions are likely functional in certain tissues or cells. It does this by analyzing the overlap between a set of differentially methylated positions and reference maps of DNase I hypersensitive sites. Those reference sets include 454 samples from various tissues, primary cell types, and cell lines from the ENCODE, Roadmap Epigenomics, and Blueprint consortia.
Beck and his colleagues assessed their approach by applying it to 20 publicly available EWAS datasets. Through this, they noted a stem-cell like signature in five cancer EWAS and were able to home in on CD14+ cells from a heterogeneous sample from an EWAS of rheumatoid arthritis, a disease in which accelerated maturation of CD14+ cells has been noted.
"Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology," Beck and his colleagues wrote.