NEW YORK (GenomeWeb) – An international team of scientists involved in the Validated Systematic Integration of Hematopoetic Epigenomes (VISION) project has received $6.1 million in grant funding from the National Institutes of Health's National Institute of Diabetes, Digestive, and Kidney Diseases that they will use to integrate and functionally validate large amounts of genomic and epigenetic data gleaned from hematopoetic cells for use in basic research as well as medical applications.
VISION seeks to use data from these cells to provide comprehensive catalogs of validated regulatory modules, quantitative models for gene regulation, and a guide for translating research insights from mouse models to human.
"A person's genetic profile can have a significant impact on disease susceptibility and response to specific treatments. However, the critical genetic variants that make up that genetic profile most often do not code for protein, but rather they are located in the much larger noncoding genome," Ross Hardison, a professor of biochemistry and molecular biology at Pennsylvania State University and team leader for the project, said in a statement. "We are studying these noncoding regions and finding new ways to extract valuable information about functional elements within them, which in turn informs us about how genetic variants play a role in disease."
Specifically, the current grant will support efforts to combine the fruits of several research efforts that have been going on for some time, Hardison said in an interview. "One of the systems that has been studied very intensely for mammals has to do with genes … that encode the alpha and beta globulins that make up the protein part of hemoglobin … because the most commonly inherited diseases in humans are hemoglobinopathies or problems with hemoglobin," he said. "We knew that there were mutations in these genes that could cause problems, and the hope was as we understood better how the gene expression is regulated, maybe we could use those mechanisms of regulation in some way to develop new therapeutic avenues."
But the researchers also seek to identify important regulatory elements in non-coding portions of the genome that could also be involved in regulating the hemoglobin genes. "That was the transition from looking at two globulin gene clusters during erythropoiesis to starting to think about the entire genome," Hardison said. Working in collaboration with members of the Encyclopedia of DNA Elements (ENCODE) project, "we started using high-throughput assays for various biochemical features associated with regulation [such as] DNAse hypersensitivity, histone modification patterns, and transcription factor occupancy." Hardison's lab also performed whole-genome assays to improve their predictions of potential regulatory elements in both the coding and non-coding regions of the genome, he said.
"We wanted to put together a team to consolidate and integrate all of this information now coming out about epigenomic features and cell types, not just in mature erythroid cells but in progenitors and other [cell types]," Hardison said. "We want to help investigators utilize the supply of genomic and epigenomic information" to gain a more "comprehensive understanding of mechanisms of regulation, which then we could put back into the various efforts trying to improve therapeutic approaches to blood diseases."
Seven labs in the US and elsewhere are working on consolidating and integrating data under the grant. In addition to Penn State, other researchers involved in the project are at Children's Hospital of Philadelphia, Johns Hopkins University, St. Jude Children's Research Hospital, Weatherall Institute of Molecular Medicine at Oxford University, NIH's National Human Genome Research Institute, and the California Institute for Medical Research.
Initial efforts will focus on integrating epigenomic information collected from various blood cell types including mature and progenitor blood cells. Planned subprojects include compiling and integrating basic epigenomic and chromatin interaction frequency information from sources such as ENCODE, the Gene Expression Omnibus, and the International Human Epigenome Consortium as well as from the participating researchers' own labs. Collected datasets include raw ChIP-sequencing data, histone modification data, genome-wide interaction frequency data, and transcription factor information, as well as data from HI-C and CaptureSeq experiments on erythroid cells.
Those datasets provide the tools needed to predict where candidate regulatory elements are in the genome as well as what the likely targets for those regulatory elements are, Hardison said.
"Once we get these data together, we go through a systematic integration [process]," he said. "We have statisticians … who are developing some very sophisticated ways of utilizing many different tracks of data and collapsing them down to definitions of chromatin states across the different cell types."
The researchers will also test whether the suspected regulatory elements are indeed impacting their predicted target genes. "We are not only going to say, 'I think these five regulatory elements are candidate elements for affecting a particular gene,' we are going to utilize the signal levels in all that epigenetic information to predict the functional output from each one of these answers," he said. "It will allow us to actually make predictions of regulatory output from all of our candidate enhancers."
Specifically, they will use quantitative modeling techniques to predict the functional output of each regulatory element-target gene prediction as well as gene editing techniques to assess the how the absence of these elements affect the target genes as well as measure how big the impact is. They'll then compare the outcomes of the computational and experimental approaches used and either modify the models or seek new datasets to improve their predictions.
In addition to elucidating regulatory elements and their gene targets, Hardison and his collaborators plan to provide best practices for translating insights from mouse models to human subjects that they can share with the community, he said.
The VISION project shares its data and other resources with the research community via its website and other public repositories. So far, the group has made several datasets available via the website including some of its ChIP-seq and histone modification data, and it will continue to release more of these integrated datasets in future. They are also keeping an eye on ongoing efforts of other consortia, such as the 4D Nucleome project, and looking to integrate datasets from those efforts as well, Hardison said.
The group has also provided links to various visualization tools such as the ENCODE Element browser, which lets users query gene expression files and genomic annotations generated by the ENCODE consortium, a 3D Genome Browser that provides heatmap visualizations of Hi-C intra-chromosomal contract matrices, and the BX browser for visualizing epigenomic and transcriptomic data. Other resources include a database and query interface for transcription factor binding sites categorized based on epigenetic conservation in both mouse and human, and a link to a repository of transcriptome and epigenomic ChIP-seq data for hematopoietic cell lineages in mouse and human along with analysis tools.
Since the project focuses on blood cell development as a model system for exploring gene regulation in mammals, Hardison expects that VISION's results will be of particular value to researchers studying leukemia, anemia, and other blood diseases which result from misregulation of gene expression during blood formation. However, the data integration and validation methods being developed and applied here would be applicable to other diseases and research contexts, for example, research into muscle development.
"Because of our own [research] history and because it is a system that can be manipulated pretty well, we are focused on a few lineages of hematopoiesis, but it can easily translate to other systems," he said.