NEW YORK, Aug. 31 – Responding to customers’ dissatisfaction with its existing data analysis methods, Affymetrix will introduce a new, improved algorithm for analysis of GeneChip arrays by the end of the year, said Tarif Awad of Affymetrix’s genomics collaborations group, speaking at the second annual Northwest Microarray Conference in Seattle on August 17th.
The company’s existing empirical algorithm “has black boxes,” and “is not based on a statistical approach,” Awad told an assembled audience of microarray researchers. “It also generates negative values for some mostly absent probe sets, and has unintuitive parameters,” he added. “Users are not crazy about it.”
Statisticians have publicly criticized the company’s existing algorithm for its assumption that microarray gene expression data fits a normal distribution. Awad acknowledged this problem. “Probe pair data is not necessarily normally distributed, so it is not appropriate to use an algorithm that assumes normal distribution,” he said. “We need to use a non-parametric test.”
Awad went on to describe the new algorithm, which employs the Wilcoxon Rank-Sum test, a classic statistical test that does not require data to fit into a normal distribution curve.
In this test, data values for gene expression in a probe set are assigned numerical ranks. The sum of each group of ranks becomes the test statistic for that rank. Using these test statistics, p-values are computed to see if the difference between the probe set and a null hypothesis, or between two comparison groups is significant.
The ranking system is applied in absolute analysis, determination of whether expression levels are significantly above background, as well as in comparison analysis between two sets of data (for example, diseased tissue mRNA vs. normal tissue mRNA).
The test does not require that users throw out outliers in a dataset, a practice in Affymetrix’s previous algorithm that statisticians had questioned due to its potential to skew the data toward more highly expressed genes. Instead, a procedure called Tukey’s Biweight Estimate is used to weight data points depending on distance from the median, Awad said.
Outside experts have also noted that Affymetrix’s existing algorithm fails to include a published statistical error model for its experiments, which means researchers have not known how much to adjust their data for variations in spot intensity, hybridization patterns, and intensity measurement sensitivity.
The new algorithm, which is to be incorporated in new versions of the microarray analysis software the company is planning to release in the fourth quarter, is designed to address these issues, Awad said.
The new algorithm has “tunable parameters,” said Awad. These include two separate sets of significance thresholds for GeneChip data.
Company scientists sought to validate the algorithm with a large well-characterized dataset of human yeast samples, using a Latin square experimental design (a matrix of unique data points). They used 14 samples (the rows in this square) and exposed each sample to 14 different RNA spike concentrations (the columns), resulting in a square with every spike group at every concentration. They examined one chip at a time, and then compared the chips using both the old empirical algorithm and the new algorithm.Their findings indicated that the new algorithm was “at least as robust as the old algorithm, and also provides the benefit of statistical confidence,”Awad said.
This new algorithm “is certainly a move in the right direction, and I am happy to see Affymetrix responding to the concerns that have been raised by their customers,” said Michael Recce, director of the Center for Computational Biology and Bioengineering at the New Jersey Institute of Technology and a GeneChip user who has sought to improve upon Affy’s algorithms.
Statisticians attending the Seattle conference also indicated that Awad’s presentation looked interesting, but said they would have to see more details to evaluate the robustness of the algorithm.
Awad said Affymetrix scientists are planning to publish the algorithm in a major computational biology journal soon.
This new algorithm is meanwhile to be introduced in new year-end upgrades of Affymetrix software.
A version of this story originally appeared in BioArray News , a weekly biochip and microarray newsletter. For more information, go to www.bioarraynews.com .