ENCODE Explosion

The Encyclopedia of DNA Elements, or ENCODE, project just published a glut of papers โ€”30 publications in a variety of journals from more than 440 researchers โ€” looking at the function of various regions of the human genome. The project's "aim is to catalogue the 'functional' DNA sequences that lurk there, learn when and in which cells they are active and trace their effects on how the genome is packaged, regulated and read," writes Brendan Maher at Nature. Nature has set up special portal housing the ENCODE papers and commentary.

In the overview paper in Nature, the ENCODE researchers note that 80 percent of the human genome has a biochemical function. Further, in The New York Times, Gina Kolata writes that researchers found that "the human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as 'junk' but that turn out to play critical roles in controlling how cells, organs and other tissues behave."

ENCODE researcher Ewan Birney tells Ed Yong at Not Exactly Rocket Science that that 80 percent figure will increase, possibly reach as high as 100 percent. "We don't really have any large chunks of redundant DNA," Birney says. "This metaphor of junk isn't that useful." (Birney has his own blog post here.)

In the Times Eric Lander likens the work to Google Maps. The Human Genome Project "was like getting a picture of Earth from space," he says. "It doesn't tell you where the roads are, it doesn't tell you what traffic is like at what time of the day, it doesn't tell you where the good restaurants are, or the hospitals or the cities or the rivers."

At his blog, Michael Eisen says that how much of the ENCODE work has been presented, both in press releases and in news accounts, is incorrect โ€” he calls part of Kolata's characterization of the work (quoted above) "complete crap," but says it was nearly inevitable given how the project has been portrayed. He adds that Birney, in his own blog post gave a measured account of the 80% idea. There, Birney says "a conservative estimate of our expected coverage of exons + specific DNA:protein contacts gives us 18%, easily further justified (given our sampling) to 20%." But Eisen says that "his quotes in the press release play a bit fast and loose with this issue."

Others like T. Ryan Gregory at Genomicron and Leonid Kruglyak on his Twitter feed also take issue with the 80 percent figure. Gregory notes that the figure is for sequences with biological activity, which he says is "a term even more loosely defined than 'function.'" He adds that the researchers chose that number as " a) it generates attention, and b) people are too busy to grasp a nuanced discussion of '20% potentially functional given present evidence, but up to 80% has some kind of activity that might also imply function.'"

Daily Scan's sister publication GenomeWeb Daily News has more on the ENCODE papers here.


It is well accepted that less

It is well accepted that less than 3% of the human genome actually appears to encode proteins or RNA. Over 8% of the human genome includes remnants of recognizable viral DNA that was integrated into the genomes of our ancestors during our evolution over millions of years. Nevertheless, it is clear that there are many promoter and repressor elements in the genome DNA flanking these genes that regulate their expressions as well as for genome structure, replication, repair and degradation. Introns within genes also provide for alternative splicing. All this has been known for decades, although this certainly has not been as extensively mapped out until the ENCODE effort. There are also likely to be many surprises yet in store in the so called "junk" or "dark" DNA.

That being said, I do wonder how much of the putative transcription factor and histone interactions with DNA sequences may actually be non-specific and truly inconsequential. Such low level interactions may simply be noise and just tolerated. However, the main reason why I have a hard time accepting that about 80% of the human genome sequence is functional and important is the data from other species with a similar number of genes, but extremely divergent amounts of DNA. For example, the fruit fly Drosophila melanogaster has 0.165 billion nucleotide base pairs (nbp), whereas the mountain grasshopper Podisma pedestris has 14 billion nbp, and the flower Fritillaria assyriaca has a whopping 124.9 billion nbp in their genomes. The human genome size lies between these insects with about 3.2 billion nbp. While the fruit fly has 85-times less DNA than the mountain grasshopper, both are very successful hexapods.

There appears to be strong evolutionary pressure in multicellular organisms to retain excess baggage so as to simply make sure that the important parts are retained. There are countless cases of this ranging from the extensive remodelling of embryos during early development, to the hundreds of thousands of superfluous phosphorylation sites in the proteins encoded by the human genome. At the levels of gross anatomy down to the molecular, there are so many examples of inefficiencies in biology. As I have pointed out above, DNA sequencing studies in diverse organisms have increasingly demonstrated extreme ranges in the sizes of their genomes, whilst still having a relatively similar number of genes. It just seems highly unlikely that this is for increasing the amount of regulation of the genome in certain organisms over others. Over-regulation can be also be highly disadvantageous, for example, as observed with more bureaucratic governments.

I cannot help but laugh now

I cannot help but laugh now at Bill Haseltine, who chided J. Craig Venter and Celera, stating that sequencing the whole human genome was not worth it. Wrong again.

I remember standing up in a

I remember standing up in a human genome meeting in the late 80's and pronouncing "God don't make junk".