Skip to main content
Premium Trial:

Request an Annual Quote

Affymetrix to Add New Microarray Data Analysis Algorithm to Software Upgrade

Premium

Responding to customers’ dissatisfaction with its existing data analysis methods, Affymetrix will introduce a new, improved algorithm for analysis of GeneChip arrays by the end of the year, said Tarif Awad of Affymetrix’s genomics collaborations group, speaking at the second annual Northwest Microarray Conference in Seattle last week.

The company’s existing empirical algorithm “has black boxes,” and “is not based on a statistical approach,” Awad told an assembled audience of microarray researchers. “It also generates negative values for some mostly absent probe sets, and has unintuitive parameters,” he added. “Users are not crazy about it.”

Statisticians have publicly criticized the company’s existing algorithm for its assumption that microarray gene expression data fits a normal distribution. Awad acknowledged this problem. “Probe pair data is not necessarily normally distributed, so it is not appropriate to use an algorithm that assumes normal distribution,” he said. “We need to use a non-parametric test.”

Awad went on to describe the new algorithm, which employs the Wilcoxon Rank-Sum test, a classic statistical test that does not require data to fit into a normal distribution curve.

In this test, data values for gene expression in a probe set are assigned numerical ranks. The sum of each group of ranks becomes the test statistic for that rank. Using these test statistics, p-values are computed to see if the difference between the probe set and a null hypothesis, or between two comparison groups is significant.

The ranking system is applied in absolute analysis, determination of whether expression levels are significantly above background, as well as in comparison analysis between two sets of data (for example, diseased tissue mRNA vs. normal tissue mRNA).

The test does not require that users throw out outliers in a dataset, a practice in Affymetrix’s previous algorithm that statisticians had questioned due to its potential to skew the data toward more highly expressed genes. Instead, a procedure called Tukey’s Biweight Estimate is used to weight data points depending on distance from the median, Awad said.

Outside experts have also noted that Affymetrix’s existing algorithm fails to include a published statistical error model for its experiments, which means researchers have not known how much to adjust their data for variations in spot intensity, hybridization patterns, and intensity measurement sensitivity.

The new algorithm, which is to be incorporated in new versions of the microarray analysis software the company is planning to release in the fourth quarter, is designed to address these issues, Awad said.

The new algorithm has “tunable parameters,” said Awad. These include two separate sets of significance thresholds for GeneChip data.

Company scientists sought to validate the algorithm with a large well-characterized dataset of human yeast samples, using a Latin square experimental design (a matrix of unique data points). They used 14 samples (the rows in this square) and exposed each sample to 14 different RNA spike concentrations (the columns), resulting in a square with every spike group at every concentration. They examined one chip at a time, and then compared the chips using both the old empirical algorithm and the new algorithm. Their findings indicated that the new algorithm was “at least as robust as the old algorithm, and also provides the benefit of statistical confidence,” Awad said.

This new algorithm “is certainly a move in the right direction, and I am happy to see Affymetrix responding to the concerns that have been raised by their customers,” said Michael Recce, director of the Center for Computational Biology and Bioengineering at the New Jersey Institute of Technology and a GeneChip user who has sought to improve upon Affy’s algorithms.

Statisticians attending the Seattle conference also indicated that Awad’s presentation looked interesting, but said they would have to see more details to evaluate the robustness of the algorithm.

Awad said Affymetrix scientists are planning to publish the algorithm in a major computational biology journal soon.

This new algorithm is meanwhile to be introduced in new year-end upgrades of Affymetrix software, including Microarray Suite 5.0, MicroDB Software 3.0, Affymetrix Data Mining Tool 3.0, along with Affymetrix LIMS Manager 3.0 , Affymetrix LIMS (LIMS Server Software) 3.0, and Affymetrix LIMS Development Server Software 3.0.

All current Affymetrix customers who have valid software maintenance agreements can receive the new upgrades free of charge. Any customer who has purchased the Affymetrix software within a year and has completed and returned the enclosed license reply cards has a current maintenance agreement. More information is included on the company’s website, www.affymetrix.com.

Affymetrix is also “building a huge database” that offers “a comprehensive view of the human genome,” on its NetAffx website, Awad said.

NetAffx, which the company introduced in July, allows users access to 100- to 600-base sequences containing the 25-mer oligonucleotide probes on its GeneChip arrays, and to conduct searches of multiple databases using these sequences. The site may include sequence and cluster visualization tools in the future. “This site is going to grow,” said Awad.

— MMJ

The Scan

Back as Director

A court has reinstated Nicole Boivin as director of the Max Planck Institute for the Science of Human History, Science reports.

Research, But Implementation?

Francis Collins reflects on his years as the director of the US National Institutes of Health with NPR.

For the False Negatives

The Guardian writes that the UK Health Security Agency is considering legal action against the lab that reported thousands of false negative COVID-19 test results.

Genome Biology Papers Present Epigenetics Benchmarking Resource, Genomic Architecture Maps of Peanuts, More

In Genome Biology this week: DNA methylation data for seven reference cell lines, three-dimensional genome architecture maps of peanut lines, and more.