At A Glance
Rork Kuick, a statistician in the University of Michigan Medical School lab of Samir Hanash, has recently worked with other statisticians to develop a software program to normalize and analyze the Affymetrix microarrays the lab uses.
BioArray News recently discussed this program with Kuick.
QFirst, can you tell me a little bit about Sam Hanash’s lab and how you use microarrays?
AIn the lab we have both an Affymetrix facility as well as the ability to dot materials, mostly proteins and cDNAs, on slides. Our primary focus is on studying cancers. Dr. Hanash, a professor of pediatrics and communicable diseases, has been awarded a “director’s challenge” grant from the National Cancer Institute (see the website, http://dc.nci.nih.gov) to profile lung, ovary, and colon tumors. We also study brain and pancreas tumors among others.
Since about 1985, after receiving my Master’s degree in statistics from the University of Michigan, I have been working in Sam’s lab.
QYou said you use Affymetix arrays. Recently, a number of people have commented that pre-fabricated oligo arrays like Affy’s are going to replace cDNA arrays. What do you think about this?
AWe opted to do some of our work with Affymetrix chips since the potential for a quicker startup was there. Other arraying methods can also give good data, and I wouldn’t want to guess what methods we will be using in two years.
QYou recently developed your own software, “readaffy,” to perform initial analysis of Affymetrix chip data. Can you tell me about this software and what led you to develop your own approach?
AOur initial motivation for writing our own software to summarize the probe-pair data from Affymetrix chips was that we had observed that we had bad saturation of the scanner for many chips, and were not happy with the trimming procedure that the Affymetrix software used. Most scanners of this type have now been adjusted so that the saturation problem is much less of an issue, but there are still piles of data from chips that have saturation.
For the trimming of the probe pairs, in which the 16 to 20 probe-pairs per transcript on the chip are boiled down to a single number for that transcript, Affymetrix software provided an average of the perfect match (PM) minus mismatch (MM) features for a probe set, after trimming PM-MM’s that were about three standard deviations from the mean of the PM-MM data for that probe-set. Unfortunately, on some chips a very large or small PM-MM would be trimmed, while on other chips where it was not quite as far out, it would not be trimmed.
This factor, and the fact that the PM-MM data is rather wild led us to design the “readaffy” software to average the PM-MM’s after throwing away the top and bottom 25 percent of the PM-MM values for the probe set.
Finally, I wanted conventional statistics tests to replace the Affymetrix “calls” of whether a transcript might be present or not, and whether there is more of the transcript in one sample than another. The “readaffy” software uses Wilcoxon signed-rank tests in both cases.
But even with our presumed improvements, we still had some trouble normalizing chips, in that the distributions of values from different chips did not have the same shape. Rather than make the means equal on two chips, or the means and the variances, Kerby Shedden, a professor in the University of Michigan Department of Statistics, wrote the quantile-normalization software that we use. It makes the quantiles of the two distributions match to a degree that is under user control.
QHow can people get access to the software?
AThe software is free to all researchers and available for download at the following website, http://dot. ped.med.umich.edu:2000/ourimage/pub/shared/Affymethods. html. This includes the readaffy software, the normalization program, and a Java-based annotation program, developed by Jean-Marie Rouillard, a postdoc in our lab, for blasting probe-set sequence against sequences that have been assigned to Unigene clusters. Affymetrix has recently revised its software so that many or all of the things we disliked have been improved. It includes a new algorithm that does not assume a normal distribution of data, and it is possible that testing will show that it is as good as or better than our current software. I haven’t tested it yet, but I would suggest that people try our software alongside the new Affymetrix data analysis software.