At A Glance:
- Tony Lee, research scientist, Whitehead Institute, Massachusetts Institute of Technology.
- BA — Harvard, 1992. Biochemistry
- PhD — MIT, 2002. Biology
- PostDoc — Whitehead Institute. 2002-2003.
Tony Lee is in his eighth year at Whitehead Institute, where he is currently a research scientist working in Richard Young’s lab. He spoke to BioArray News this week about the work that Whitehead researchers are conducting on yeast promoter region microarrays.
Can you give me a broad definition of this research?
It’s location-analysis technology that is part of the Whitehead Institute’s goal of deciphering transcription regulatory networks, genome wide. It’s an effort that is three to four years old. Currently, we have about eight people working on it.
Basically, we’re trying to figure out where regulatory proteins bind in living cells. When you’re putting together regulatory networks, you want to build connections between genes and the factors that control them. Just looking at expression levels is kind of an indirect way of attacking this problem. So one of the driving forces in using this technology is that studying gene expression changes doesn’t tell you how that change is being effected.
Transcription is performed by a whole assembly of proteins, but we originally focused on classical DNA-binding activators — the proteins that bind to a specific sequence upstream of genes. We ended up printing a new microarray, based on intergenic regions, not open reading frames. We are looking at a different subset of the genome, not the regions coding for functional genes, but the regions where regulatory information is embedded.
What are your sources for the sequence data?
We use standard public domain yeast sequence. We use a simple definition of intergenic region, i.e., the sequence between two ORFs. Some of the larger areas were broken up into multiple probes on the array.
How big are the arrays?
Each array has around 7,200 features, about 6,900 for different regions of the yeast genome and another 300 control spots of various kinds.
How big are the probes?
We use oligos to PCR up fragments that range from 100 base pairs to a kilobase and a half. The average size is just under a kilobase.
How many have been used so far?
We have probably printed 4,000 arrays. On a good week, when everyone is going, we use about 30 to 40 arrays.
How do you do the analysis and the normalization?
For every factor we look at, we perform location analysis in triplicate and use an adaptation of the error model published by Hughes, et al., in Cell. In terms of normalization, we usually do a simple median normalization of the bulk signal in each channel — one channel represents DNA enriched for protein binding sites, the other channel is a control sample of unenriched cellular DNA.
What do the arrays look like?
They look like most glass slide arrays. The one thing that is a little different is the colors we expect to see. Gene expression arrays are looking for changes between two states. So if you’re using a standard red-green color scheme, you expect to see red, green and yellow: genes are differentially expressed in one condition (red or green) or equally expressed (yellow). If we see that with the location analysis, it usually means something has gone wrong. We are only expecting enrichment in one channel so we look for a uniform yellow background and red spots. If we see a lot of green spots, something has gone wrong.
How do you prepare the libraries?
In order to figure the proteins that are bound, we do a chromatin immunoprecipitation. We have a large collection of cells where various proteins have been epitope tagged. We grow the cells in rich media conditions and then add formaldehyde, which crosslinks the regulator to the DNA. So if the regulator binds in vivo, it’s now crosslinked to specific regions of DNA. Then we lyse the cells and shred the DNA and immunoprecipitate the proteins, using antibodies specific for the epitotope tag. Since it is cross-linked to the DNA, we enrich for regions of the genome that are bound by that protein.
There’re a couple novel aspects of the work. This approach combines the chromatin immunoprecipitation and the microarray, so now you have all of the binding sites in a whole genome for a particular protein. So it’s one of the few ways you get a genome-wide view of the in vivo state of protein binding. Since the yeast genome is small and well-defined, it’s feasible to examine the entire regulatory network. Last year, we looked at 106 binding regulators. Our goal is to collect data for the complete set of all the DNA binding regulators and expand that into other factors, like chromatin factors. In the future, we’re looking at expanding our research into different kinds of environmental conditions. All of our previous work has been done in a single growth condition, which is not a good model for how cells respond to changes in the environment.
This is a very interesting technology as it continues to open up a lot of possibilities for understanding cellular organization and regulation. A lot of genome-wide technologies are descriptive, that is they describe the state of a particular aspect of the cell. In the future, we’re going to see more and more sophisticated combinations of these genome-wide technologies to develop models for molecular mechanisms that explain gene expression.
This approach will be a key step in making that leap from description to explanation.that gap and explains the mechanism. It’s very clear that it is just one of a whole number of theories of genome-wide data sources available. Taking comparative sequences of multiple-use genomes, and large numbers of RNA microarray experiments, are very interesting data sets in the future.