Assistant Professor of Cellular and Molecular Medicine
University of California, San Diego
At a Glance
Name: Bing Ren
Title: Assistant Professor of Cellular and Molecular Medicine, University of California, San Diego.
Educational Background: BS, University of Science and Technology, Heifei, China, 1991; MS, computer science, Harvard University; PhD in biochemistry, Harvard University, 1998.
Appearing in this month's edition of Nature, a new paper entitled "A high-resolution map of active promoters in the human genome" signals the entry of using a chromatin immunoprecipitation-coupled DNA microarray analysis (ChiP-on-chip) method in characterizing how expression is regulated by promoters in the human genome.
Led by researchers at the University of California, San Diego, NimbleGen Systems of Madison, Wisc., and the Ludwig Institute for Cancer Research, the project team completed a high-resolution map of promoters that it hopes will enable detailed analysis of transcription factor binding sites within fibroblast cells, and will serve as a model for characterizing gene expression and understanding cellular logic in other cell types.
To learn more about this study, BioArray News spoke last week with Bing Ren, an assistant professor of cellular and molecular medicine at UC San Diego, who co-authored the Nature paper.
Maybe you could give us a little background on this paper. What led you to this particular research?
We have identified roughly 25,000 genes in the genome but we know virtually nothing about how their expression is regulated in different types of cells. We know in individual genes how expression is controlled, but in the global scale, very little is known.
The National Institutes of Health has initiated an effort called Encyclopedia of DNA Elements, with the ultimate goal to comprehensively identify all functional elements in the human genome. And my group is part of this effort, and our charge is to identify transcriptional regulatory elements in the genome. We know there are at least three types of regulatory elements — promoters, enhancers, and repressors. Also there are other types of regulatory elements that exist.
Promoters happen to be the most important category out of these, because they are sequences that determine whether a gene is turned on or off in the cells. They are sites where the transcription machinery, namely RNA polymerase complexes, bind and initiate transcription. They are also the sites where transcriptional regulators bind, and recruit transcription machinery to initiate transcription. So, the first step in determining gene regulatory mechanisms in cells is obviously to determine transcription and regulatory sequences in the genome, and to start that, we first focus on identifying a comprehensive list of promoters in the cells.
And you decided to do this using fibroblast cells.
Fibroblast cells have the advantage that they are easy to grow in cultures and it's a normal primary cell, so the result that we obtain can be more closely linked to human physiology. Plus they have been used as a model system by a lot of biologists to study cell cycles and to study metabolism. To us it's a natural choice.
You wrote that the examination of the expression profile generated by your study revealed four general classes that defined the transcriptome of the cell. Is this new or has any of this research been done before?
Part of the reason [for initiating this project] was to provide an unbiased assessment of many of the current hypotheses or observations in gene regulation. A lot of work has been done to understand how individual gene transcription is regulated and we have seen many different kinds of pathways or regulatory mechanisms. If you look at literature on individual genes, there are examples on each of [the four classes].
The general assumption is that RNA-polymerase binding would lead to transcription for a gene. But there are also cases where people have reported that the loading of this transcription machinery itself is not sufficient. There are additional steps. So, from the literature there have been reports of all kinds of categories, but there has never been an assessment on a global scale of which is more prevalent. Our observation is that 75 percent of the promoters have this correlation between the loading of the initiation machinery and the transcription support, and that there is a close correlation between transcription machinery's low-binding promoter and the transcription of the gene. We also found that in roughly 10 percent of the genes, loading of the machinery does not immediately lead to transcription.
This suggests [that] additional regulatory steps in subsequent pathways may be important for their gene's regulation. There are [some] very interesting mechanisms that this suggests. One is the so-called existence of microRNA genes. We know microRNA genes are a newly found class of RNA gene that play an important role in regulating the abundance of RNA. But microRNAs are very unstable — at least the pre-messenger. The primary messenger RNAs are stable and they are rapidly processed into mature microRNA. So we have observed some genes that we saw loading, but we could not define expression. We suspect that those genes may be subject to microRNA regulation, but we don't have solid evidence. I think this would lead to further experiments.
So you used the ChIP-on-chip to do this?
I think there's a little bit of history behind this decision. I was the first to develop this technology to investigate protein-DNA interaction inside the cells. ChIP is now becoming a popular technique to investigate in vivo protein-DNA interactions.
The method works roughly this way: You take living cells, and you treat them with a chemical agent so you can basically cross-link a protein to its DNA-binding substrate. After this you can disrupt the cells and fragment the chromosome DNA into very small pieces and obtain a very complex mixture. From this mixture of protein-DNA complexes you can use an antibody to recognize the protein that you are interested in to fish out the DNA this protein binds to. This way you can have a relatively pure population of DNA that are bound by this protein inside the cell. To identify those pure DNA you can use DNA microarrays to simultaneously identify all the enriched species in that mixture. And that's what we used to identify the binding sites of the RNA polymerase initiation complex … this is how we identified the [polymerase preinitiation complex]-binding sites.
What role did NimbleGen play in this?
The NimbleGen microarray technology was essential to reveal the genomic binding sites. The reason is that they have an advanced technology that can synthesize high-density arrays that have close to 400,000 oligos on each array. So to represent the entire human genome, we can use roughly 38 of these arrays. The nice thing about their synthesis technology is that it is all software-controlled so initial cost to synthesize them is reasonable.
Did they offer you any technical support?
This work was done in collaboration with NimbleGen's scientists, with Roland Green, who is a co-investor in this project, and they have provided a lot of support. Basically we submitted the samples to them and they conducted the hybridization, collected the data, and sent us the microarray data for us to analyze.
How is your method different from those that were previously used?
Promoters are by definition the transcriptional "start sites" of genes. The most commonly used method to map promoters is by determining the five prime end of the messenger RNA of a gene, and there have been many efforts in the past to systematically clone and sequence full-length genes.
The current promoter definition is based upon full-length gene-sequencing projects. That's what we used. But this effort has some limitations. Specifically, many of the full-length databases that we have today are contaminated by truncated DNA sequences. It's hard to distinguish which ones are full length and not full length. Also, the current full-length sequencing effort is still not complete. There are still many genes for which we don't know their five prime end. So our effort is actually an independent effort that can both confirm existing full-length sequence data and also can identify promoters for genes that we do not yet know the five prime end.
So, importantly, we discovered roughly 10,500 promoters by our approach, and nearly half were not reported to be promoters before. So we have substantially increased our knowledge of promoters.
You also estimated that 13 percent of human genes remain to be annotated on the genome.
Using this unbiased approach, we were able to provide certain evidence that there are still thousands of genes that have yet to be identified by conventional cloning methods as genes. So, although we can't fully identify the genes, only the information on their promoters, the knowledge of their promoters will focus future work to search on the genes nearby. Hopefully this will help identify some interesting kinds of genes.
What's the next logical step for your research?
For our research, we'd like to take this step further to characterize active promoters in other human tissues, both normal tissues and tissues in pathological states, like cancer tissues. We would also like to take this step further to characterize the enhancers in the human genome, and the repressors. Our eventual goal is to construct a comprehensive map of transcription regulatory elements in the human genome. I hope that such a map will help scientists to decipher the gene regulatory code that controls transcription of every gene in the cell.