AT A GLANCE:
Associate Professor, Department of Human Genetics, and Principal Investigator, Montreal Genome Center, McGill University
Past experience: Director of Informatics, Imaging Research
Associate Professor, Brock University, Ontario, Canada
PhD, Experimental Psychology, Concordia University, Montreal, Canada
What has changed in microarray experiments and data analysis over the last few years?
If I go back five years when I first started at Imaging Research, virtually all the microarray analyses in the public literature were non-statistical, and in fact most of the studies at that time had no replicates. Since then, in the last few years, practicing scientists have embraced the notion of obtaining replicates. Certainly cost has been a big issue, and it has come down since. Not obtaining replicates, though, and getting unreliable results, also has a cost associated with it, and I think that over time, experience has led the practicing scientist to acknowledge that.
The other big change is that initially, many experimental scientists did not believe that statistics had anything to offer. Some of them, not just those using microarrays, argued that if you need to use statistics to find an effect, the effect can’t possibly be important. And I think that notion is based on a misunderstanding of the error characteristics of the data, and of what statistics can do.
The other big change is that the experimental designs have become a lot more complex. The questions that are being asked by the designs are more interesting, but also more difficult to handle statistically.
What advice do you have for a researcher who is planning a large microarray study?
The simple answer is, consult a statistician. Certainly statisticians have not only a lot to offer in terms of data analysis but also in terms of designing the experiment. Everyone I know who does consulting on these matters always asks the scientists to speak with them before they design the study, not afterwards. Because sometimes, if a confounding effect is present in the data, a statistician can’t remove that effect, it’s there and you are stuck with it. To give a simple example: If I test all my control slides on one day, and all my experimental slides on another day, if I get a treatment-control difference, I can’t tell whether the difference is due to the treatment manipulation or due to a “day effect.” That’s why you want to set up your experiment so that your interpretation is as straightforward as possible, so that you are not left with asking yourself “Is it due to the biology, or is it due to how I conducted my experiment?”
There are simple things that one can do to counterbalance potential confounding effects. For example, in nylon membrane technologies, the membranes are often reused, and the signals degrade after increased use. If you have two membranes, you use one for control and one for treatment, you run them and wash them. Then when you repeat the experiment, you want to switch the membranes, so you are not always using the same membrane for treatment or control. If you are not able to run your experiment in one day, run the treatment and controls in pairs. Try to anticipate parts of your procedure that might affect your results and take that into consideration with your design.
Also, before you dive in with biological replicates, get a good understanding of what your technical variation is, and control it to the extent that you can. That way you don’t have to fix it later using some statistical methodology, but you get it under control experimentally, which is always more desirable.
What is the merit of doing technical replicates?
Some people refer to the technical replicates as repeated measurements. They have value in the early goings of setting up your laboratory and your experimental protocols, and as a way of coming back occasionally to check that everything is still going correctly. You get a sense, for example, of the random error you expect from your data. And not being concerned about biology has the advantage that at least you are not confusing the issues. When your later experiments include biological variation, the overall variation will almost certainly go up, but you are going to have it as low as possible.
Which statistical methods for gene expression data analysis do you recommend?
There is the model that my colleagues and I at Imaging Research developed, the ArrayStat software. There are many other methods, there is SAM [Significance Analysis of Microarrays] by Rob Tibshirani and his colleagues at Stanford, for Affymetrix chips in particular there is the dChip [DNA-Chip Analyzer] software from Wing Wong at Harvard, there is the RMA methodology put out by Terry Speed and his colleagues at the University of California at Berkeley.
There are many approaches with common themes. The issues are well recognized: one has to normalize, one has to be concerned about outliers, how background correction is done, about correctly estimating random error, about power analysis in running experiments, and so on. There is agreement on the importance of these various issues, and there are slight variations in how one addresses them. I think with high-quality data, the results from various statistical models will hopefully be reasonably similar. The comparisons have begun to be done but I think the jury is still out. In part the problem is that the data quality is often low, and with low data quality, even the best statistical model is going to produce less than optimal results.
How do you strike the right balance between specificity and sensitivity, that is, between false positives and false negatives?
Balancing between the two at all stages of the research design is important. I think perhaps the best analogy is the diagnosis of a potentially fatal disease. A good initial diagnostic test wants to minimize extremely the false negatives. In other words, if someone has the disease, you want them identified. But the false positive rate may be quite high, which psychologically, to the person receiving the news, might be quite difficult. But nevertheless, since this is a first test, the importance of the false negatives is paramount. As you proceed further down the line, there is going to be a follow-up test, and now the balance may shift a little bit. Certainly at the point where you are contemplating surgery, which is a very invasive technique, the balance is going to shift even more because you don’t want someone to undergo serious surgery if it’s not necessary.
By analogy, in a series of studies, early on you may want to avoid false negatives, in other words, you want to make sure that in your screen you get all the effects that you are looking for. Microarrays are often a screening technology, and you can verify the results subsequently.
That being said, with the sheer number of genes that we test for in microarrays, even if the false positive rate is low, the number of genes that are false positives can still be extremely large. It’s especially true when a very small number of genes is truly differentially expressed. If one does not control for the false positives, one can easily end up with hundreds of false positives relative to a very small number of true hits, and that ratio would, for most people, be unacceptable.
What do you think about databases for gene expression data?
The sequence databases have proven to be extremely valuable, despite the known errors in the sequences. The beauty is that once you have a sequence, it’s an absolute string. In terms of expression, the results are relative, not absolute. Within a laboratory they can run, for example, two chips, get some results, and then run three more chips, presumably on the same experiment, and get quite different results. It is very difficult to combine the data, even within a laboratory, because the effects are often non-linear. The public expression databases want to do even more than that — they want to compare across time, across laboratories, across technologies, which has proven an extremely difficult thing to do.
The field is still holding out hope for very good universal standards, but even the application of standards in various experiments is bound to be difficult. Preferably, you would want the standard to have very good precision, you do not want it to fluctuate very much.
That being said, public expression databases can still serve useful purposes. For example, if a particular laboratory found a list of genes to be differentially expressed, you could do a similar experiment, perhaps with a different organism, and say ’Look, I found these genes and they have similar sequences.’ But if one wants to, for example, take a control group from the public expression database and run an experimental group and compare them, that would be extraordinarily difficult.
How about using expression databases for meta-analysis?
Meta-analysis is an underused technique, I have never seen it with microarrays, but it is incredibly feasible. For example, if you were trying to get a sense of what the effect size of a particular treatment is, that’s where meta-analysis can be really useful. So when a number of different studies have looked at similar questions using perhaps different technologies or different organisms, a meta-analysis would be a really interesting thing to do. What you do is, you group experiments and look for consistency between them, and whether there is a reasonable estimate of the effect size of various differential expressions. That would be a great use for public expression databases.