AT A GLANCE
Senior research fellow, genomic pharmacology, Merck & Co.
Initiated Merck’s DNA microarray effort in 1995.
Co-authored first article applying microarrays for drug metabolism and drug safety. “Monitoring expression of genes involved in drug metabolism and toxicology using DNA microarrays,” Physiological Genomics 5; 161-170, 2001.
PhD in microbiology at the University of Tennessee, Knoxville in 1989. Studied plant-induced genes in Rhizobium bacteria.
Completed a Fulbright fellowship in Hungary; NIH training fellowship at the University of Tennessee; NIH postdoctoral fellowship at Jefferson Cancer Institute.
Interests include developing and implementing custom oligonucleotide microarray technologies and gene expression analysis using Affymetrix arrays.
In your talk at the recent Macroresults through Microarrays conference in Boston, you discussed your work with custom oligonucleotide arrays. Why do you use these and not the Affymetrix arrays that Merck buys?
We didn’t develop the custom technology to duplicate what Affymetrix does, but rather to produce lots of arrays at a small cost with a smaller number of genes. We wanted them to be cheap and readily accessible, so people literally don’t have to think twice about how many arrays are needed for experimental design. The first array costs about $40,000, but after that, they are about $10 apiece, so it pays to make and buy hundreds or thousands.
Where do you get your oligos?
We compared three companies, ordering the same plate of oligos from each. The one that beat the others hands-down was Integrated DNA Technologies. I have since referred others to IDT, and they’ve been happy.
Some groups have said they prefer large groups of short oligos, such as 20-mers, while there is another movement afoot to go for long oligos, from 50- to 70-mers. But in your talk you said you went with 45-mers. Why did you choose this in-between length?
Shorter oligos give you better specificity and better hybridization kinetics, and there is less secondary structure you have to worry about. Their principal limitation is that hybridization under almost any condition is a reversible process. Right at the 40- to 50- nucleotide stage it becomes irreversible, and obviously there are some advantages to irreversible hybridization. We did some tests where we compared 45-mers to 70-mers from the same set of genes. The 45-mers showed similar sensitivity but better specificity. We said, well, it’s a little cheaper to make 45-mers and they’re easier to design.
So, how do you incorporate use of these oligo arrays with your projects on the Affymetrix arrays?
These oligo arrays were designed for the second-pass experiments. This strategy has changed following Merck’s acquisition of Rosetta Inpharmatics, because Rosetta has these in-house arraying capabilities as well.
But before this all happened, you developed a software program called Merck Oligo Design, or MOD, to select oligos. Can you explain how this works?
You tell it what parameters you want to use, and feed it a list of DNA sequences to design from. Essentially, it starts at a very high stringency and looks for oligos that are exactly what we think we want, then will go through and relax those criteria until it spits out a list of six to 10 oligos. It runs fast enough to churn through all known human, mouse, and rat gene sequences within one day.
Why do you use six oligos instead of one, if you select the oligos with stringency?
Nobody really understands how oligos hybridize to each other — there is a set of rough rules but the other half is black magic. If you test them thoroughly using the Transcript Pools Approach you can really decide which oligo works and can go to one oligo. I guess the principal reason is that we are not very limited for space on the chip. It is efficient for us to make arrays that cover 300 to 1,000 genes. If we try to make more than that we are limited by bioinformatics. It really takes a lot more time to pick the right gene sequences than to make the arrays.
In your talk, you discussed a pooled spiking procedure for assessing sensitivity and specificity of a given oligo array. From what I understand, you produce transcripts for each gene on the array, and divide them into subsets or pools. You then spike in a known quantity of transcripts into RNA from a foreign organism, hybridize them to complementary spots on an array, and then use the intensity of signal to measure the intensity at any given abundance of transcripts. Can you tell me more about this?
We developed this Transcript Pools or TraP procedure because microarray technologies are generally poorly characterized. In our procedure, we used yeast RNA to dilute our transcripts because it is as unrelated to the mammal genes as possible. When we look at sensitivity we want to be able to detect a given gene transcript at one per 100,000 transcripts. That’s why we look at the ratio of each yeast gene spiked transcript to the total mRNA. We know exactly how much transcript we put in because we make these transcripts in vitro, purify them, quantitate, then dilute to a constant concentration before spiking them in. Our statisticians looked at each gene individually and plotted the performance of the oligos for each gene. So we can see what level of signal each oligo generates. We also see which oligos cross-hybridize to other genes, and which oligos are gene-specific.
With this approach, it is completely possible to make a generalizable application. If somebody were to collect all of the genes in each species, do the pooling, then prepare the RNA, it would be relatively straightforward for anybody who wants to evaluate an array technology to do it for all of the genes on that array. Everyone could use the same pools to evaluate the technology and could figure out which oligo works best for each gene and what the sensitivity is for your arrays. You could also compare across technologies. A manufacturer could do a thorough benchmark experiment once, rather than making each customer do redundant benchmark experiments.
Are there any things you are working on to improve this system?
The principal thing we’re never satisfied with is the sensitivity. Most well-developed array technologies seem to be hitting the wall of sensitivity at somewhere around one part in 300,000. But in a lot of our experiments using animal tissues we need to do a lot better than that. We have been confounded in some experiments in trying to detect things like GPCRs and cytokines. If we want to detect G-protein coupled receptors, the transcripts are often in a select subset of cells, often in low copy number. DNA array technology does pretty badly in detecting GPCRs. With brain tissue, we know that all of the neurotransmitter receptors are there, but we can only detect a subset. If we get the sensitivity, better microarrays would be much more powerful. The limit is not the amount of signal, it is a problem of signal-to-noise due to the limits of specificity in hybridization reactions. The heterologous nucleic acids are associating where you don’t want them to and producing the noise. Maybe the bases are associating in non-Watson-Crick interactions.
So how do you get past this limitation of array technologies?
The solution today is to go to quantitative RT-PCR. We did a study with the drug metabolism group here, where we treated rats with compounds known to induce cytochrome P450 genes. With arrays we could detect that the CYP1A1 gene was induced at least nine-fold. We also knew that the uninduced level was down in the noise. We repeated those observations with TaqManQ-RT-PCR and found the real induction was not nine-fold but 5,000-fold.
What do you think the microarray community needs to develop next?
High-throughput [hybridization] is something that we need pretty soon. There are some cases where you would like to run thousands and thousands of the arrays. We need to be able to isolate RNA in a high-throughput manner and that’s not such an easy thing. Collecting and grinding up samples is difficult. We need to be able to set up the hybridization and do the washes in a high-throughput manner. I am not aware that anybody has solved these engineering problems yet. So much else in pharma and in science in general is done in a 96-well format. Is it possible to do hybridization in a 96-well format? For the first few years, microarrays were done by molecular biologists like me, not by engineers or statisticians, so the technology has been limited by our proclivities. Doing these things in a 96-well format would be a great leap forward.