Statistical software giant SAS Institute has established a genomics group that will aim to apply its methodology to analyzing gene expression data, Russ Wolfinger, the director of the division told the New England Bioinformatics Group last week.
Wolfinger told those gathered for the first NEBiG meeting of 2001 that a statistical approach to gene expression analysis could improve the accuracy of experimental results.
The method he described uses two interconnected mixed linear models to assess statistical differences in gene expression data from microarrays.
The introduction of the new method comes amid a host of claims that the bulk of current gene expression experiments are fundamentally flawed because the importance of statistical significance is often overlooked. Other researchers, such as Eytan Domany of the Weizmann Institute of Science in Israel, are also developing new clustering algorithms designed to help make sense of the vast amount of microarray data.
Wolfinger said most papers published on the topic consider only fold changes in expression, eliminating those genes with lower than a two- or three-fold change. However, this approach could miss a biologically important gene that may exhibit a 1.2-fold change that is reproducible and precisely measurable and, therefore, statistically significant. Conversely, he said, some genes may have a large fold change in one array but be highly variable from array to array, which would lower their statistical significance.
Calling the approach complementary to clustering, he explained that it works “by answering a different question.”
“Clustering looks for similarities. This looks for differences,” he said. By accounting for variance in expression levels, he said the approach ensures that “the signal is greater than the noise,” which reduces the occurrence of false positives and false negatives.
An added benefit of the method, according to Wolfinger, is that it works just as well without a reference sample. Instead, he said, it uses a circular design, comparing the first sample to the second, the second to the third, and so on until the final sample is compared to the first. Though this claim appeared counterintuitive to some of the NEBiG attendees, Wolfinger said that unlike clustering, which has to “compare apples with apples,” the differences in variability that his method relies on are accounted for in the statistical model.
But the question remains whether the approach will help biologists and statisticians meet on common ground as Wolfinger envisions. Noting that this method requires four or more replicates, one attendee remarked that her lab never runs a microarray experiment more than twice due to time and cost constraints.
“I’m a statistician,” Wolfinger quipped, “of course I’m going to tell you to keep doing it over and over again if you can afford it.”
Wolfinger developed the method in collaboration with Greg Gibson, of North Carolina State University; Elizabeth Wolfinger, of Meredith College; and researchers from the National Institute of Environmental Health Sciences.
The team has submitted a paper on the approach to a leading journal.