NEW YORK (GenomeWeb News) – An international collaboration involving more than 400 researchers working to characterize gene regulatory networks in the human genome is publishing dozens of new studies this week.
In papers appearing in Nature, Science, Genome Research, Genome Biology, Journal of Biological Chemistry, and elsewhere, members of the Encyclopedia of DNA Elements, or ENCODE, consortium describe approaches used to define some four million regulatory regions in the genome, among other things. All told, the team explained, ENCODE efforts have made it possible assign biological functions to around 80 percent of genome sequences — filling in large gaps left by studies that focused on protein-coding sequences alone.
"We found that a much bigger part of the genome — a surprising amount, in fact — is involved in controlling when and where proteins are produced, than in simply manufacturing the building blocks," ENCODE's lead analysis coordinator Ewan Birney, associate director of the European Molecular Biology Laboratory European Bioinformatics Institute, said in a statement.
"This concept of 'junk DNA,' which has been sort of perpetuated for the past 20 years or so is really not accurate," ENCODE researcher Rick Myers, director of the HudsonAlpha Institute for Biotechnology, said during a telephone briefing with reporters today. "Most of the genome — more than 80 percent of the base pairs in the genome — has some biological activity, some biological function."
Researchers participating in a complementary effort within the larger ENCODE project, known as GENCODE, more completely characterize the coding portions of the genome. "As part of the ENCODE project, we both tidied up the protein-coding genes and we also found many non-coding RNA genes as well," Birney said during today's telebriefing.
Based on the success of ENCODE so far, the project is expected to be extended by another four years or so. The amount of new funding from the National Human Genome Research Institute for that follow-up work is expected to be as high as $123 million.
"Later this month, NHGRI will be announcing a new round of funding that will take the ENCODE project into its next phase," NHGRI Director Eric Green said during the call.
Studies done in the decade or so since the human genome was deciphered have highlighted how little of the genome is actually comprised of gene sequences. With the realization that only around 2 percent of the genome is dedicated to protein-coding functions came a spate of speculation about the role of the other 98 percent of genome.
While this portion of the genome was suspected of harboring regulatory sequences, the extent of that regulation and its impact on coding sequences in human tissues over time was not known.
"When the Human Genome Project ended in 2003, we quickly realized that we understood the meaning of only a very small percent of the human genome's letters," Green explained. "We did know the genetic code for determining the order of amino acids and proteins, but we understood precious little about the signals that turned genes on or off — or that controlled the amount of proteins produced in different tissues."
To begin studying such control networks systematically, the international ENCODE consortium kicked off the main phase of its analyses in 2007, following an earlier pilot study.
NHGRI has provided $123 million for the project over the past five years. Another $30 million went to support the development of ENCODE-related technologies since the ENCODE pilot started in 2003, while $40.6 million from NHGRI went towards the pilot itself.
During the study's main phase, investigators from nearly three-dozen labs around the world took multi-pronged approaches to assess transcription factor binding patterns, histone modification patterns, chromatin structure signatures and other features of the genome that interact with one another to control gene expression over time and across different tissues in the body.
To accomplish the roughly 1,600 experiments done to test some 180 cell types for ENCODE, teams turned to methods such as chromatin immunoprecipitation coupled with sequencing to define the genome-wide binding patterns for more than 100 different transcription factors, for example, while other strategies were used to profile DNA methylation patterns, chromatin features, and so forth.
"It's really a detailed hierarchy, where proteins bind and epigenetic marks — like DNA methylation and other marks — precisely cooperate and regulate how the genes are going to get turned on [or off] and the amount of this," Myers said. "These complex networks are one of the big components of the contributions of the 30 papers that are being published today."
For example, a University of Washington-led team reporting in Science online today defined millions of regulatory regions, including some that are operational during normal development, by taking advantage of an enzyme known as DNase I, which chops off DNA specifically at open chromatin sites in the genome. That group found that more than three-quarters of disease-associated variants identified in genome-wide association studies fall in parts of the genome that overlap with regulatory sites.
"We now know that the majority of these changes that are associated with common diseases and traits that don't fall within genes actually occur within the gene-controlling switches," University of Washington genome sciences researcher John Stamatoyannopoulos, senior author on that study, said during today's telebriefing. "This phenomenon is not confined to a particular type of disease. It seems to be present across the board for a very wide variety of different diseases and traits."
Results from such analyses also hint that some outwardly unrelated conditions might be traced back to similar regulatory processes. And, researchers say, by bringing together information on active regulatory regions with disease-risk variants, it may be possible to define new functionally important tissues for certain conditions.
"By creating these extensive blueprints of the control circuitry, we're now exposing previously hidden connections between different kinds of diseases that may explain common clinical features," Stamatoyannopoulos said.
"This has also allowed us to see that the GWAS studies that have been performed contain far more information than was previously believed," he added, "because hundreds of additional DNA changes that were not thought to be important also appear to affect these gene-controlling switches."
The new data are also expected to help in understanding genetic disease and interpreting information from personal genomes, according to Michael Snyder, an ENCODE investigator and director of Stanford University's Center of Genomics and Personalized Medicine.
"We believe the ENCODE project will have a profound impact on personal genomes and, ultimately on personalized medicine," Snyder told reporters. "We can now better see what personal variants do, in terms of causing phenotypic differences, drug responses, and disease risk."
Many of the studies stemming from ENCODE can be viewed through a Nature, Genome Research, and Genome Biology-conceived website that links ENCODE papers that share themes or "threads" that are related to one another.
Along with the newly published papers, the ENCODE team is making data available to other members of the research community through the project's website. Data from studies can also be accessed through an ENCODE browser housed at the University of California at Santa Cruz or via NCBI or EBI sites.
"For basic researchers, the ENCODE data represents a powerful resource for understanding fundamental questions about how life is encoded in our genome," NHGRI's Green said. "For more clinically-oriented researchers, the ENCODE data provide key information about which genome sequences are functionally important."