With up to €16 million ($20 million) in funding from the German Federal Ministry of Education and Research, a consortium of 21 German research groups plans to generate 70 reference epigenome maps over the next five years using next-generation sequencing.
The project, dubbed DEEP for Deutsches Epigenom-Programm, is part of the International Human Epigenome Consortium, which coordinates a number of epigenomic projects in the European Union, the US, Canada, and South Korea.
Among the primary goals of IHEC is to analyze at least 1,000 epigenomes within seven to 10 years, producing histone modification maps, DNA methylation maps, transcription start site maps, and catalogs of small RNAs and non-coding RNAs. The consortium also wants to compare epigenome maps of model organisms relevant to human health and disease.
According to its website, IHEC will focus on key cellular states and will survey individuals, pedigrees, and identical twins in order to determine the relationship between genetic and epigenetic variation.
IHEC is coordinating its efforts with other international projects, such as the International Cancer Genome Consortium and the Encyclopedia of DNA Elements, or ENCODE, project. While ENCODE focuses on defining functional DNA sequences in the genome, for example, IHEC plans to "define the patterns of epigenetic regulation occurring at those sequences in different primary cells."
IHEC will encompass a variety of human cell and tissue types, including normal healthy cells, tissues from disease states such as cancer, obesity, atherosclerosis, autoimmune disease, autism, psychiatric disorders, asthma, and addiction; stem cells; and cells exposed to infection, toxins, or stress.
DEEP in particular will focus on metabolic and inflammatory disease, such as obesity, fatty liver disease, bowel disease, and rheumatoid arthritis, and will compare cells from healthy and disease-affected individuals. The project, which started Sept. 1, will run for five years, with an evaluation after three years.
Six laboratories will be in charge of data production, all using either the Illumina HiSeq 2000 or HiSeq 2500 platform: RNA sequencing − including small RNAs and large non-coding RNAs − will be conducted by researchers at the Max Delbrück Center for Molecular Medicine in Berlin and at the University of Kiel; chromatin modifications will be analyzed by groups at the Max Planck Institute for Molecular Genetics in Berlin and at the Max Planck Institute of Immunobiology and Epigenetics in Freiburg using ChIP-seq; and DNA methylation will be studied by whole-genome bisulfite sequencing by researchers at Saarland University and the University of Duisburg-Essen. In addition, the Saarland group will perform DNAse hypersensitivity mapping.
According to Jörn Walter, a professor of epigenetics and genetics at Saarland University and the coordinator of the DEEP project, there were some initial concerns about distributing data production across so many centers, but the organizers decided to include many experts rather than just one center in order to be able to develop new technologies for the project.
The reason for settling on the HiSeq platform for data production was that several DEEP participants were already using that technology. In addition, the platform is also used as the main technology in the Blueprint project, another IHEC project focusing on hematopoietic epigenomes that was funded with almost €30 million from the European Union last year (CSN 11/16/2011).
"At least for the next two to three years, we hope to be able to stick to this technology and to do most of our work based on that," Walter said. "We are not seeing any dramatic next-next generation sequencing change on a large-scale basis in the near future. All the technologies that are on the horizon have their own niche, but they are not really a competitor to what's already on the market … They are not genome wide."
All groups will be using standardized protocols recommended by IHEC, he said, and will implement their own processing pipeline. Two labs conducting the same type of analysis will also study a few of the same cell types in order to assess technical variation between them.
In addition, the project plans to develop a number of new methods, for example to identify allele-specific transcripts on a large scale, to study the role of methyl cytosine oxidation, to identify methylation patterns, to reduce the amount of material required for ChIP-seq, and to improve RNA mapping, in particular for non-coding RNAs.
For example, Walter said, DEEP participant Qiagen wants to reduce the amount of material used in bisulfite sequencing and DNAse hypersensitivity analyses.
His own group, he said, has developed a new amplicon bisulfite sequencing method that uses hairpin technology in combination with the 454 platform to generate thousands of amplicon sequences in order to analyze the formation of complex tissue CpG methylation patterns on both strands.
All data will be produced in a uniform format, an effort coordinated by a group at the Max Planck Institute for Informatics in Saarbrücken, and the data will be stored at the German Cancer Research Center in Heidelberg prior to being deposited in public databases.
The cells and tissues to be analyzed will be provided by a number of research groups with clinical partners, Walter said, and several DEEP groups will conduct functional studies in either mouse models or tissue culture.
Sanofi Aventis Höchst, for example, a partner in the project, plans to conduct functional studies on fat metabolism using drugs affecting the metabolic state of fat cells, he said.
Walter said the main challenges will be to coordinate the distribution of cells, which "have to be in a very good state;" to standardize data production; and to develop tools for interpretation.
Another challenge will be communicating clearly how the data were processed, so that different datasets can be compared, an effort that also needs to be coordinated between different IHEC projects. "It must be transparent for people to follow how data were processed and evaluated," he said.