NEW YORK, April 2 - One of the biggest pitfalls in analyzing microarray data is the process of normalizing gene expression levels across two or more chips, or removing systematic variation between the chips when comparing different experiments, researchers have said.
“It's the same as if you are pointing a telescope into deep space and measuring how bright a star is,” said Nat Goodman, senior vice president of the bioinformatics consulting firm 3rd Millennium in Cambridge, Mass. “If you are comparing data from two different telescopes, or from the same telescope on different nights, one clear and one cloudy, you have to take that into account.”
“What's striking in the microarray arena,” Goodman noted, “is that people have simply ignored this issue.”
One of the major problems in this area is that Affymetrix, the leader in the microarray field, has not published a statistical error model for its experiments, which means that researchers do not know how much to adjust their data for variations in spot intensity, hybridization patterns, and intensity measurement sensitivity.
While Affymetrix software does normalize gene expression patterns across chips, statisticians say the algorithms it uses are unreliable because they assume that two different chips will have a comparable number of genes expressed, with the same average expression levels – in other words, that the distributions of expression in different chips can be simply moved along an axis, inflated or deflated to provide comparisons.
“What Affymetrix does is they take the distribution of all gene expression on a chip, and they chop the tails of it, because distributions can have very long tails,” said Michael Recce, director of the Center for Computational Biology and Bioengineering at the New Jersey Institute of Technology. “Then they take the mean of the distribution, and compute the ratio of the mean so they can actually multiply the scaling factor times [the mean] and bring it up to match the other one.”
But this method does not work when the average expression level is higher on one chip than another. With a higher expression level, more gene expression signals will survive the statistical cutoff performed by chopping off the tail ends of the distribution, Recce said. “And if you move the distribution away from the cliff, you catch more of the genes, and have a higher mean.”
Affymetrix has not responded to several requests for comment.
Bioinformatics companies such as Rosetta Inpharmatics and Imaging Research, a majority-owned subsidiary of Amersham Pharmacia Biotech, have taken advantage of this statistical quagmire, offering software programs that provide for more robust analysis and normalization of data. Many price-conscious microarray users, however, cannot afford to invest in this software on top of the still expensive microarrays.
For this community, Recce and other bioinformatics experts are addressing he normalization problem head-on, in an effort to come up with valid statistical models that all scientists can use to normalize and analyze Affymetrix data, as well as data from cDNA microarrays.
Recce has taken a conservative approach when comparing chips, first making note of Affymetrix’s “absent” or “present” calls, and determining whether a gene is expressed or not expressed on each chip. Then Recce ranks genes that are expressed on a single chip from highest to lowest level of expression. On the chip with more genes expressed, he takes the ones with the lowest expression levels and removes them, so that both chips have a comparable number of genes expressed. “That's roughly equivalent to shifting the good distribution down, so it's as bad as the other distribution. Then you can do a realistic comparison.”
Recce acknowledges this method of “scaling down” is not ideal for researchers who are trying to identify all of the genes that are involved in a particular signaling pathway, since these researchers will want to detect every single gene expressed. But if arrays are being used to determine whether a particular drug has an effect, or as a diagnostic tool for different forms of cancer, “you're better off scaling down the good data whether than scaling up the bad data,” he said. In other words, the array comparison tool may not be as sensitive as it could be if all expression differences were detected, but the dataset will still have internal validity.
This method, however, also has other drawbacks, said Goodman. Throwing away
data points that indicate a low level of expression “forces you into the low hanging fruit, the easily detected situations,” he said. “And now that your average scientists wants to use [microarrays] to study something that's hard, the methods that have gotten us this far are not going to get us to the next step.”
Eric Schadt, of the biomathematics department of the University of California, Los Angeles , Wing Wong, of the Harvard School of Public Health's statistics and biostatistics departments, and two other statisticians, Cheng Li and Cheng Su, have also proposed an alternative method of normalization, in a recent article that appeared in the February issue of the Journal of Cellular Biochemistry .
The group applied a standard nonlinear curve technique, the smoothing spine, to a scatter plot of the data on two arrays to normalize data in the common situation where the relationship between the two data sets is non-linear, i.e. one set cannot simply be scaled up to match the other.
“This procedure performs well when the two samples to be compared have a small number of differentially expressed genes,” Wong said.
For cases where two samples diverge widely, the group developed a rank-selection method somewhat similar to Recce’s. “This method first selects a set of genes with the property that the rank of a gene in this set according to its expression measurement in one array is similar to its rank using values for the second array,” Wong said. “Since genes selected this way tend to be non-differentially expressed, they will form a valid basis the computation of a normalization relation.”
Schadt, Wong, and their colleagues have used this ranking procedure to develop dChip, an expression array analysis software tool that academic and other non-profit researchers will be able to freely access. Wong is still testing dChip, but anyone who wants to have early access can contact him at [email protected] .
Terry Speed, a statistician at University of California-Berkeley, has applied statistical methods to normalize cDNA data. His calculations take into account not only the differences in numbers of genes expressed between chips, but also the inter-chip variations in intensity of green Cy3 and red Cy5 dyes, and the error built in to the print head. His statistical methods are available on his website, www.stat.berkeley.edu/users/terry/zarray/Html/normspie.html .
But the issue of how to properly normalize data is still not settled, said Goodman. “Researchers need to take the heart the statistical literature that’s out there, just as users need to be much more compulsive about requiring statistical rigor before they believe the statistical results that are coming from their data.” And meanwhile, he recommended: “Do a lot of confirmation studies. Don't trust the numbers that the computer tells you.”