AT A GLANCE
PhD, molecular and developmental biology, Cambridge University, UK.
Postdoctoral fellowship 1997-98 at Lawrence Berkeley National Laboratories with Edwin Rubin. Worked with transgenic mice with inserts of Chromosome 21, screening for genes implicated in mental retardation.
As a researcher at the department of molecular and medical pharmacology at UCLA, focuses on analyzing the neurobiology of behavior and the etiology of human psychiatric disorders using large insert transgenic mice and microarrays.
Recently, you published a paper in the Journal of Neuroscience Research, in which you looked for genes implicated in learning and memory using mouse cDNA microarrays and mouse hippocampus tissue from four different strains. How did you arrive at this technology as a mechanism to study genes involved in these activities?
People have been investigating genes involved in IQ in humans for a very long time. Despite the ethical conundrums involved in these studies, there’s a basic interest involved in what makes some people very smart and some not so smart. It is well known that there are very smart strains of mice and very dumb strains. We used the hippocampus, which is essential in learning and memory, to look for essential gene expression differences between two smart strains and two dumb strains.
When you did these experiments, you hybridized your mouse hippocampal RNA to a 9,000-spot mouse cDNA array. How did you choose your cDNAs for this array, and did you pick out genes that were previously implicated in learning and memory?
We purchased a random collection of 9,000 mouse cDNA clones from Research Genetics, and we spotted the arrays ourselves partly as a matter of convenience. At UCLA we have a central core facility that makes spotted microarrays, and we recently have been funded by the NSF to make our own core facility. So because our friends and neighbors were making their own spotted arrays it seemed natural to set up our own facility.
What aspect of microarray experimentation posed the greatest obstacle or challenge to you, and how did you address it?
The main breakthrough for us was teaming up with a set of mathematically sophisticated engineers at University of Southern California. We were really quite naïve about how noisy and how variable the data is. By collaborating with these people, who had a lot of expertise in extracting consistent signals from noisy data involved in cell phones, radios, etc., we were able to take data which appeared to have no rhyme or reason to it and make sense of it.
This focus on data analysis was a strange experience for me, as I am more of a wet lab scientist. The actual physical experiments were done quite quickly. What was very time-consuming was the analysis of the data. I personally find it very exciting, however, as my first degree was in physics and I do have an interest in mathematical and computational approaches to biology.
In the paper, you state that a co-author, Alex Ossadtchi, processed the data and removed spatial trends due to variations in printing of the chips. Can you explain this data processing step?
Alex Ossadtchi is a graduate student at USC, working under Richard Leahy [director of the university’s Signal and Image Processing Institute]. What they found is if you did a standard scatter plot of Cy3 and Cy5 intensities, you would see a scatter plot that looked not like a straight line, but instead a twig snapped in two at different places. They found that these segments of different slopes corresponded to different spatial areas of the arrays that have different efficiencies of hybridization. What they mean by removing spatial trends in the data was to place the two different segments of the twig onto one scatter plot with one slope.
Instead of doing a dye flip — reversing the Cy3 and Cy5 and doing a replicate of a hybridization — to correct for dye bias, initially, you aligned the histograms of the Cy3 and Cy5 signals. Later you did a dye flip to confirm your findings of the 27 differentially expressed genes you found, and confirmed 20 this way. Why didn’t you do a dye flip from the beginning?
Other investigators who discovered this phenomenon about halfway through our study drew our attention to this, and we added it on later as a semi-ad-hoc procedure. Now we’re much more aware of this potential confounding factor and we try and build in dye flips from the beginning. But some of the experiments we’re doing, we’re taking the philosophy that microarrays are not the final analysis but a useful screening tool we use to hone in on a few genes of substantial interest. In that way we don’t regard dye flips as being essential.
In your work, you discovered 27 genes that were consistently differentially expressed between the two strains of smart mice and the two strains of dumb ones. Have you since gone back and tried to fit these genes into a cellular model of learning and memory?
We did some simple bioinformatics work and literature searches to see if we could place these genes on a common unifying pathway. But many are of unknown function. We are next going to analyze the genes—and their function—one by one using engineered mice. We hope to take the analysis to the next step this way.
In your paper, you said that you used a threefold cutoff for differential expression of genes, but also used a t-statistic to determine whether genes were significantly differentially expressed. Can you explain your methods, and which do you prefer?
In general we’re strong advocates not of an arbitratry twofold or threefold cutoff but some more flexible tool such as the t-statistic. When genes are strongly expressed, you can be confident of a difference of 10 to 15 percent. If they are expressed at a low level, than a twofold cutoff becomes more appropriate.
You used principal components analysis to analyze the data. Given that you had a complex experiment with two types in one group (smart mice) and two in another (dumb mice), why did you not use ANOVA, which looks at between- and within-group variability?
Again ANOVA is an approach that was developed [for microarray analysis] while we were in the middle of this work. We didn’t include such analysis in our paper but this time, if we were to reanalyze the data, we might use the ANOVA from the getgo.
PCA and ANOVA are quite complementary. PCA is excellent as a discovery tool. In contrast, the ANOVA approach is much more rigid. You have to have specific predictions or hypotheses about the effects on the data, and you might miss certain features of the data. The advantage is, ANOVA provides a much more rigorous approach to assigning confidence to the data. PCA is an excellent discovery tool but the rigor isn’t quite as strong.
What advice would you give to a colleague who is just starting to use microarrays?
If you can find a friendly person who has a lot of experience in mathematical analysis of the data, it would be a tremendous help. Unfortunately, now these individuals are like gold dust.