NEW YORK – Through the third phase of the Encyclopedia of DNA Elements (ENCODE) Project, researchers have uncovered millions of additional regulatory regions within the human genome.
In more than a dozen papers appearing in Nature and related journals on Wednesday, ENCODE researchers described the nearly 6,000 additional datasets they generated to search for functional elements within the human and mouse genomes.
The ENCODE project kicked off in 2003 shortly after the completion of the human genome sequence was announced. While the first phase of the project explored 1 percent of the human genome as a pilot, the second phase expanded to the entire genome and incorporated sequencing-based technologies. The third phase began in 2012 and included even more assays and cell types.
As detailed in the new publications, the ENCODE researchers analyzed about 500 cell or tissue samples — previous iterations relied largely on cell lines — and developed a registry of cis-regulatory elements. They also generated maps of DNA accessibility and where transcription factors and other proteins may bind the genome. At the same time, the ENCODE team developed web-based tools to enable other scientists to visualize their data.
"The data generated in ENCODE 3 dramatically increase our understanding of the human genome," Brenton Graveley from UConn Health, a co-author of one of the studies, said in a statement. "The project has added tremendous resolution and clarity for previous data types, such as DNA-binding proteins and chromatin marks, and new data types, such as long-range DNA interactions and protein-RNA interactions."
Based on 5,992 new experimental datasets they generated, the ENCODE Project researchers developed a registry of 926,535 human candidate cis-regulatory elements and 339,815 mouse candidate cis-regulatory elements, as they reported in their flagship Nature paper. This, they added, is a 22 percent increase over ENCODE phase 2 findings. The candidate elements could be categorized as either enhancer-like, promoter-like, or CTCF-only. CTCF-occupied elements could be insulators, enhancer blockers, or chromatin loop anchor elements.
Other ENCODE papers explored the organization of the genome. Researchers led by Wouter Meuleman of the Altius Institute for Biomedical Sciences developed high-resolution maps of DNase I hypersensitive sites based on more than 700 human biological samples, indexing 3.6 million DHSs. At the same time, Jeff Vierstra, also from Altius, and colleagues generated high-density DNase I cleavage maps, while Christopher Partridge and colleagues from the HudsonAlpha Institute for Biotechnology mapped how 208 proteins — including 171 transcription factors — interact with the human genome using ChIP-seq.
Stanford University's Michael Snyder and his colleagues additionally used ChIA-PET to map chromatin loops in two dozen human cell types. They found variations in slightly more than a quarter of chromatin loops by cell type, which appeared to be associated with changes in gene expression.
Meanwhile, researchers led by Eric Van Nostrand of the University of California, San Diego and his colleagues focused on RNA-binding proteins, which also regulate gene expression. They turned to an approach dubbed eCLIP that uses UV light to crosslink RNA with proteins bound to it. With this approach — which they applied to 150 RNA-binding proteins — they further homed in on where proteins bind to RNA and began to tease out what their functions might be.
"Why they activate in one location and repress when they bind to another location is a longstanding puzzle," co-author Christopher Burge from the Massachusetts Institute of Technology said in a statement. "But having this set of maps may help researchers to figure out what protein features are associated with each pattern of activity."
The ENCODE Project also examined cis-regulatory elements in mice, particularly during development, which could give insight into human development. For instance, researchers at the Ludwig Institute for Cancer Research and elsewhere used a combination of ChIP-seq and ATAC-seq to generate a mouse chromatin accessibility map for 72 different tissue stages, while Yupeng He and colleagues of the Salk Institute for Biological Studies profiled the methylomes of 12 mouse tissues or organs at nine developmental stages. Overall, they noted a general decline in CG methylation during fetal development. They further predicted more than 460,000 putative developmental tissue-specific enhancers.
Additionally, a team led by Caltech researchers profiled mouse polyA-RNA in 17 tissues and organs from mouse fetal development. With the addition of single-cell and other data, they also began to predict which enhancers were active in which cell types.
A handful of other ENCODE papers appearing in Nature Communications and Nature Methods reported on the transcriptional activity of mouse pseudogenes, a custom annotation of ENCODE datasets to use in cancer applications, and an approach to predict active enhancers.
As part of the third phase, ENCODE project researchers developed a specialized browser called SCREEN. "A major priority of ENCODE 3 was to develop means to share data from the thousands of ENCODE experiments with the broader research community to help expand our understanding of genome function," Eric Green, the director of the National Human Genome Research Institute, which funded the project, said in a statement. "ENCODE 3 search and visualization tools make these data accessible, thereby advancing efforts in open science."
In a related commentary in Nature, Chung-Chau Hon and Piero Carninci of the RIKEN Center for Integrative Medical Sciences noted that "[t]his yet-to-be-completed encyclopedia has already become a quintessential tool for understanding gene regulation and genetic predisposition to disease."
The fourth phase of ENCODE is to further expand the cell types and tissues and to include single-cell transcriptomic and additional open-chromatin assays to get a better grasp of the heterogeneity of those cell and tissue types, as the project leaders noted in Nature.
Hon and Carninci added that they'd like to see the fourth phase include a systematic analysis to evaluate whether the cis-regulatory elements cataloged in this phase actually do what they have been predicted to do.