At A Glance
Received his PhD from Hofstra University in 1990 in clinical and school psychology. Wrote thesis on empirically derived typologies of obese persons.
1990-1991 — Postdoctoral research at Johns Hopkins University School of Medicine and New York Obesity Research Center at St. Luke’s / Roosevelt Hospital Center
1991-1994 — Research scientist at the New York Obesity Research Center and associate professor of medical psychology at Columbia University College of Physicians and Surgeons.
2001 — joined University of Alabama-Birmingham as professor of biostatistics and head of the section on statistical genetics, associate director of Clinical Nutrition Research Center.
Grant scoreboard: $42 million in grants as principal investigator or collaborator.
What’s a clinical psychologist doing clothed in the garb of a biostatistician? David Allison, a professor at the University of Alabama-Birmingham, might be the prototype for a cross-disciplinary scientist. He is clearly an academician not afraid of straddling traditional boundaries: He is a native New Yorker now living in the heart of the South; He is carrying on his research interests in obesity and, at the same time, expanding the frontiers of statistical analysis of microarray research.
Earlier this month, Allison, 39, and his team earned a four-year, $2.2 million grant from the National Science Foundation to develop and evaluate the use of scientific methods of microarray technology. Add that money to a total of $42 million in grants he is involved in.
Recently, he got on the couch to talk microarrays with BioArray News:
You have a PhD in psychology, and now you are deep into biostatistics. How did you make that leap in your career?
Evolution. I’ve always been interested in obesity, I’ve been studying that since I was an undergrad. I got interested in statistics largely for practical reasons. We were studying certain topics in tests and measures. I felt if I was going to be using IQ tests, I should understand these kinds of statistics better.
I went to Columbia University and St. Luke’s Roosevelt Hospital for a post-graduate fellowship, and saw in Science magazine an opportunity to attend NATO advanced research workshops. So, I wrote a letter that said I would like to be considered. Not only did they invite me to the meeting, but they gave me a little grant to pay my expenses. I got hooked on genetics, on the statistical aspects of it. That Mendelian theory, those simple two laws, the law of segregation and the law of independent assortment, told you so much, gave you so much information about the expected pattern among data, that you could fit very complex and sophisticated models to things and learn very interesting things.
Tell me about your first contact with microarrays.
That happened in 1999. I collaborate extensively with Dr. Richard Weindruch of the University of Wisconsin at Madison. My roles in those studies, which involve caloric restriction in aging, mainly involve statistical analysis. Flying out to visit him, I came across a paper in the August 1999 issue of Science. I read it with great fascination. He had this long list of genes that he thought were differentially expressed between old or young, or calorically restricted in ad-lib mice, but there were no P values, no confidence intervals nor standard errors and I thought: How does he know that they were differentially expressed? So, I said, maybe you need some help with this, and he said, yes, we do.
I did a literature search and I found that in the whole field, there was very little written about how to statistically analyze microarray data at that time and what was there was, was almost exclusively focused on cluster analysis, which is a perfectly reasonable technique and has some utility but doesn’t answer the question that in my experience Dr. Weindruch and other investigators want to answer — and has a lot of problems associated with it.
What we have always emphasized in the grants that we have written is that we are looking to work from a rigorous epistemological foundation. We are saying it is not enough to have a method that intuitively seems reasonable, not enough to have a method that makes a pretty graph, nor results on one dataset that seem to make sense. The method has to be either proven mathematically or demonstrated by extensive computer simulations to have certain properties.
In reading your CV, you are a part of some $42 million in current grants — give or take a few thousand dollars. You must write one hellacious proposal.
I do all right. That number is accurate as describing as the breath it covers. But, that’s not $42 million in my accounts here at UAB this year. Some of those funds get spent as five-year grants, some get spent at Oxford and some get spent at Columbia. It’s spread across many projects, many universities.
How many graduate students do you have working for you?
One. He is a good one. I work his butt off. I’m relatively new here and I haven’t had a chance to recruit that many grad students. I have a half dozen post-docs and six faculty working with me.
Let’s talk about data normalization. Will the grant you have just received help set parameters for that?
We have multiple aspects to it. We think of what statisticians and analysts do, for the most part, as being lumped into five categories — measurement, design, inference, estimation, and classification. Our proposal is structured around those five themes. In measurement, we put normalization under that. How do we go from knowing here is a particular tissue or animal of what-have-you, in which we want to measure gene expression, to saying: Here is the ultimate number that we are going to stick into our formal inferential statistical test or T test or ANOVA. How do we generate our numbers? We address issues that account for the most variability, or error variance in measurements. Is it the preparation of tissue? Is it the software used to extract from an image a numerical piece of information? Is it the normalization routine? We will work on what are better or worse normalization routines. I don’t profess to know the answer yet. The software we use for ourselves does three or four different normalization routines. We want to know for ourselves what makes the most sense. There are things in the Affymetrix oligonucleotide systems, the so-called perfect match and mismatch, and there are a lot of questions about now and the entire field is waking to that. The simple subtraction that has been done may not be optimal, so the question is: Should we use simple subtraction, should we use just the PM alone and ignore the MM.
What is your favorite algorithm for normalization?
We are using quantile normalization. I couldn’t honestly say that I have a reason for believing it’s the best. We have two lives: developing and evaluation of methods, and the other is analyzing real data; and we have to do them both. That creates a challenge because we are building a plane as we are flying it. We can’t always be using analysis that has as strong an epistemological foundation; or we can’t always be making decisions with as much knowledge and information as we wish we had. We try to be very explicit with our colleagues where we are confident and where we are not. With our normalization, we say, we can give you three or four options. We believe, at a gut level, that a thing that seems as good or better than any of the other choices we have, but we can’t give you any kind of real proof or solid evidence that it is any better. That’s where we are in terms of normalization.
If money was no object, what kind of microarray work would you do? Or, now, is money no object?
Money is always an object. I would, first of all, build a much bigger resource team than I have. I would multiply, by multiple orders of magnitude, my computing power. Second, by several orders of magnitude, I would increase the number of programmers at my disposal. Third, I would start doing massive very well-organized factorial experiments in the wet lab, to explore the measurement properties of microarrays.
What kind of money are you talking about to do this?
$10 million would go a long way and do most of that. There are some pretty important questions that need to be answered.
How about a lab?
If I have a cell phone, a credit card, a pencil and a laptop computer, that's my ideal lab.
In microarray analysis there is a divergence between the high end chips, like Affymetrix, and the low end, the do-it-yourself chips.
Right now, I can’t tell you which one is better. I think that one of the critical questions will be cost. If the Affymetrix type systems come down a great deal, great. What is important to realize is that a system that has somewhat lesser reliability and is markedly cheaper, may be superior from an experimental design point of view than a system that has greater reliability and is much more expensive because you can then run many more subjects on the less expensive system. If you are in a particular situation where the things you are studying, mice or cell lines, whatever, are very inexpensive and easy to generate lots of replicates but chips, microarrays, are very expensive, at least for an Affymetrix type system, or inexpensive for a cDNA system, even if the Affymetrix were more reliable, you have to say: If I can get k times more chips for the same amount of money with system B rather than A, you have to ask yourself: Is system A really more valuable than B?