To tackle the mountain of issues involved in microarray data analysis, David Allison, a biostatistician from the University of Alabama at Birmingham, is bringing together a diverse group of researchers for a retreat at Mohonk Mountain House in New Paltz, NY, from September 10 through 13.
At the retreat, Allison plans to focus discussion on microarray-specific issues in the five major categories of statistical analysis: measurement, design, classification, inference, and estimation.
In the area of measurement, one issue is how to deal with the negative fold changes that sometimes appear.
"With the Affymetrix chip, you subtract the mismatch from the perfect match. Sometimes the mismatch is higher than the perfect match, but you can''t have negative gene expression, so what do you do?" Allison asked. Some researchers make the fold change zero, or throw out the data point, but Allison has proposed keeping the negative number and using it as a relative measurement.
A further question involves whether fold change can be used as a valid metric for array analysis at all. "It''s not clear what people mean when they say that fold change has such and such effect without referring to sample size," Allison said. "A fold change of ten might not even be significant in a tiny sample size, whereas 1.001 might be significant if a sample is big."
In the design area, the group may discuss whether it is better to pool DNA and RNA from a large number of organisms onto one chip, or have one organism per chip.
For classification, Allison would like to reexamine cluster analysis, and look at how to extract meaningful conclusions from it, as well as possible alternative analysis techniques.
Inference issues that may be addressed at the seminar include the question of whether P-values can be used to determine statistical significance of gene expression changes in arrays.
In the area of estimation, Allison''s research group has a paper in press at the journal Computational Statistics and Data Analysis. The paper, "A Mixture Model Approach for the Analysis of Microarray Gene Expression Data," proposes a model wherein researchers can use the data from other genes on a chip to help estimate the expression level of a single gene.
Even if they don''t adopt the model Allison proposes, he said it is important for researchers to adopt some objective standard for estimation of whether gene expression changes are significant. "If a researcher says, ''We have looked at enough mice to know how much variance is there, and 60-fold change in expression is way above that variance,'' I am not saying that subjective knowledge is not valid," said Allison. "But what most of us believe that science offers is objectivity. I say, ''Write it down. Show me.''"
In addition to the retreat, Allison said he would set up a visiting scholar program between members of the network. This program would allow scholars at different levels to go to other universities in the network to study with other researchers for a period of time.