AT A GLANCE
Tony Yuen, research assistant, Department of Neurology, Mount Sinai School of Medicine, working in the lab of Stuart Sealfon, [rofessor of neurology, pharmacology and biological chemistry
PhD Cand., Neurology, at Mount Sinai School of Medicine, expected Fall 2002.
Recently published a paper in Nucleic Acids Research, “Accuracy and calibration of commercial oligonucleotide and Custom cDNA microarrays.” (Nucleic Acids Research, Vol. 30 No. 10 e48), with postdoctoral researcher Elisa Wurmbach, Robert Pfeffer, Barbara Ebersole, and Stuart Sealfon as co-authors.
How did you get into microarrays?
We [at Stuart Sealfon’s lab] want to know what genes are induced by activation of the GnRH [Gonadotropin releasing hormone] receptor under different GnRH stimulation frequencies. Microarrays are great tools to [discover] the candidate genes.
In your Nucleic Acids Research paper, where you compared Affymetrix and cDNA arrays using a gonadotrope cell line, you used Affymetrix U74 mouse chips. As you know, these are the chips containing the incorrect sequence. Did you do this experiment as a way to utilize these otherwise unusable chips?
The experiment was done before Affymetrix realized that some of the oligos were designed using the sense strand. When we knew about the incorrectly designed oligos, we used the probe mask provided by Affymetrix to eliminate those probe sets (about 2,000 clusters) and re-analyze the data. Using real-time PCR, we were able to confirm most of the candidate genes obtained from this Affy experiment. Since there are about 10,000 clusters that are usable, we think that this is good enough for our purpose. Affymetrix did send us replacement U74Av2 chips, but I think that we will not learn anything new by repeating the experiments using the newer version. Most of the clusters that are affected are ESTs anyway. Results in the NAR paper are obtained from data analysis with the probe mask.
Now you compared these chips to cDNA chips with 956 clones selected from the National Institute on Aging 15K library and spotted with a GMS, now Affymetrix, 417 arrayer on Corning-GAPS coated slides using a Cy3/Cy5 dye labeling system. What led you to use this protocol, and did you look at potential dye bias in the labeling of the cDNA chips?
Details of the development of our custom-printed array can be found in a Journal of Biological Chemistry paper that I co-authored with Elisa Wurmbach, Barbara Ebersole, and Stuart Sealfon, “Gonadotropin-releasing hormone receptor-coupled gene network organization.” (J Biol Chem 2001 Dec 14; 276(50):47195-201.) There is no bias in the labeling procedure. We swapped the dyes and got exactly the same results. The same experiment was done using Affy U74A chips in order to provide wider gene coverage.
In the experiments you used three replicate cDNA arrays, each with a control and experimental sample, and six Affymetrix chips, three with an experimental and three with a control sample. Is there a magic number of replicates you need to get robust results?
We tried to do the experiment in triplicate. For duplicates, if there is a discrepancy between the two measurements, it is difficult to tell which one is the outlier. Since we are using a cell line that is homogenous in nature, from our experience we think that it is not necessary to do a large number of replicates. There is not much variability between cells that are grown on different dishes. On the other hand, for samples that are more complex, such as tissue samples that contain different cell types, or samples that come from different animals, more replicates will be required.
Also, with the six Affymetrix chips, couldn’t there be some chip-to-chip variability that would not be present on a cDNA array, where the control and experimental sample are hybridized to the same chip?
We always use the same lot of Affy chips in an experiment. From our experience the chip-to-chip variation from the same lot number is minimal. In order to confirm this (and all experiments that we’ve done with Affy chips), we did binary comparisons of arrays with control vs. control [samples] (C1 vs. C2, C1 vs. C3, and C2 vs. C3) and with experimental vs. experimental (E1 vs. E2, E1 vs. E3, and E2 vs. E3). Using a combination of the Affymetrix difference call algorithm and our selection criteria, none of the genes are found to be regulated. We also used scatter plots to visualize the data. Everything stayed on the 45-degree diagonal line when we compared control with control, or experimental with experimental. This suggests that in addition to chip fabrication, cell treatment, RNA preparation, probe labeling, and sample hybridization are all highly reproducible.
When you compared the two different kinds of chips, you found that they had comparable results in identifying 16 out of 17 differentially regulated transcripts among 47 genes you chose, but both underestimated the fold change in expression compared to QRTPCR. Why do you think this bias exists?
I think the bias comes from non-specific hybridization and a saturation effect on the microarray platform. On the Affy chip, 25-mer oligos are not specific enough for any gene (that’s the reason why Affy puts 16 to 20 pairs of oligos to query one gene). On the cDNA array, non-specific hybridization can occur with homologous DNA sequences, such as [those] from members of the same gene family [as the target sequence]. A high-stringency wash should be able to resolve this, but usually, the washes in the array protocols are not stringent enough. Limited binding sites on the chip and/or limited range of the scanner constitute the saturation effect.
For example, let’s say the non-specific signal plus background is 100 counts. If the specific signal is 10,000 counts for the control and 30,000 counts for the experimental sample, the apparent signals on the array will be 10,100 and 30,100 and the signal from non-specific hybridization is negligible. However, if the specific signal is 100 counts for the control and 300 counts for the experimental, the apparent signals on the array will be 200 (100 [background] + 100 [specific signal]) and 400 (100 [background] + 300 [specific signal]). Instead of a three-fold change, the array will report only two-fold. If the specific signals are 10 and 30, you will get 110 and 130 on the array, and will not even get the increase call. But this doesn’t just mean that lower-expressed genes are less detectable. It’s the ratio between the non-specific signal and the specific-signal. Therefore, if the non-specific signal is 10,000 counts, the fold-change as reported by the array will always be diminished no matter how strong the specific signal is, assuming that the output of the scanner saturates at about 60,000 counts. We know that some genes, such as many of the immediate early genes, express in low numbers normally but they can be induced to a very high level (over 1,000-fold). Because of the bias with the array platform, this [1,000-fold expression level might be reported as less than 100-fold.
Now you only obtained 16 out of 17 correct calls for the oligo arrays when you used Affymetrix microarray suite 5.0 to analyze the data. When you used 4.0 you found that the Affymetrix arrays were in fact less accurate than the cDNA arrays, only identifying 14 out of 17 differentially expressed genes. Would you say this difference in performance from 4.0 to 5.0 is significant, and have you found 5.0 to work generally better than 4.0?
From my experience, Microarray Suite 4.0 gives false negatives and Microarray Suite 5.0 gives false positives using the default settings (in addition to our detection criteria). If an exhaustive list of regulated genes is needed, 5.0 can be used for data analysis. The false positives should be eliminated after careful validation of candidate genes. In contrast, if the list of regulated genes does not have to be comprehensive, 4.0 will perform better because it generates a smaller list with fewer false positives to start with.
You came up with a mathematical method to correct for bias in underestimating the difference in expression levels between experimental and control samples with the cDNA arrays, but not with oligo arrays. So what should users do about the oligo array results?
A difficult question. We are not able to develop any mathematical model to correct the Affy fold change. Therefore, I think users should keep in mind that the fold changes, especially the larger ones, are probably underestimated in their Affy experiment.
Also, your fold change correction method for the cDNA arrays increases the CV for cDNA arrays from 20.2 percent to 33.6 percent. Does this mean you need to do more replicates?
More replicates will not help. After correction the fold changes scatter around the real value with a slightly higher standard deviation. Increasing sampling size will not improve the SD value.
Now you also came up with a new mathematical model for estimating the efficiency of PCR. Are you planning to publish this model on its own?
The model is already described in the Nucleic Acids Research paper and I don’t think we will write a paper on the model alone.
Also, what are you doing with microarrays now?
Actually I spend much more time on real-time PCR than on microarrays. I can do an Affy experiment in about two weeks, from cell treatment to RNA preparation to labeling to hybridization to data analysis. However, it takes me months to follow up those candidate genes. To us, array screening is for discovery of candidate genes. Further experiments, such as detailed time-course or dose-response are done using quantitative PCR. (For example, see “Coupling of GnRH concentration and the GnRH receptor-activated gene program,” Mol Endocrinol 2002 Jun;16(6):1145-53.) The Nucleic Acids Research paper is just a little side track. Since we have data on cDNA and on oligo arrays that compare essentially the same samples, we thought that many people would be interested to see how these two platforms compare.