Skip to main content

Affymetrix to Add New Microarray Data Analysis Algorithm to Software Upgrade

Premium

Responding to customers’ dissatisfaction with its existing data analysis methods, Affymetrix will introduce a new, improved algorithm for analysis of GeneChip arrays by the end of the year, said Tarif Awad of Affymetrix’s genomics collaborations group, speaking at the second annual Northwest Microarray Conference in Seattle last week.

The company’s existing empirical algorithm “has black boxes,” and “is not based on a statistical approach,” Awad told an assembled audience of microarray researchers. “It also generates negative values for some mostly absent probe sets, and has unintuitive parameters,” he added. “Users are not crazy about it.”

Statisticians have publicly criticized the company’s existing algorithm for its assumption that microarray gene expression data fits a normal distribution. Awad acknowledged this problem. “Probe pair data is not necessarily normally distributed, so it is not appropriate to use an algorithm that assumes normal distribution,” he said. “We need to use a non-parametric test.”

Awad went on to describe the new algorithm, which employs the Wilcoxon Rank-Sum test, a classic statistical test that does not require data to fit into a normal distribution curve.

In this test, data values for gene expression in a probe set are assigned numerical ranks. The sum of each group of ranks becomes the test statistic for that rank. Using these test statistics, p-values are computed to see if the difference between the probe set and a null hypothesis, or between two comparison groups is significant.

The ranking system is applied in absolute analysis, determination of whether expression levels are significantly above background, as well as in comparison analysis between two sets of data (for example, diseased tissue mRNA vs. normal tissue mRNA).

The test does not require that users throw out outliers in a dataset, a practice in Affymetrix’s previous algorithm that statisticians had questioned due to its potential to skew the data toward more highly expressed genes. Instead, a procedure called Tukey’s Biweight Estimate is used to weight data points depending on distance from the median, Awad said.

Outside experts have also noted that Affymetrix’s existing algorithm fails to include a published statistical error model for its experiments, which means researchers have not known how much to adjust their data for variations in spot intensity, hybridization patterns, and intensity measurement sensitivity.

The new algorithm, which is to be incorporated in new versions of the microarray analysis software the company is planning to release in the fourth quarter, is designed to address these issues, Awad said.

The new algorithm has “tunable parameters,” said Awad. These include two separate sets of significance thresholds for GeneChip data.

Company scientists sought to validate the algorithm with a large well-characterized dataset of human yeast samples, using a Latin square experimental design (a matrix of unique data points). They used 14 samples (the rows in this square) and exposed each sample to 14 different RNA spike concentrations (the columns), resulting in a square with every spike group at every concentration. They examined one chip at a time, and then compared the chips using both the old empirical algorithm and the new algorithm. Their findings indicated that the new algorithm was “at least as robust as the old algorithm, and also provides the benefit of statistical confidence,” Awad said.

This new algorithm “is certainly a move in the right direction, and I am happy to see Affymetrix responding to the concerns that have been raised by their customers,” said Michael Recce, director of the Center for Computational Biology and Bioengineering at the New Jersey Institute of Technology and a GeneChip user who has sought to improve upon Affy’s algorithms.

Statisticians attending the Seattle conference also indicated that Awad’s presentation looked interesting, but said they would have to see more details to evaluate the robustness of the algorithm.

Awad said Affymetrix scientists are planning to publish the algorithm in a major computational biology journal soon.

This new algorithm is meanwhile to be introduced in new year-end upgrades of Affymetrix software, including Microarray Suite 5.0, MicroDB Software 3.0, Affymetrix Data Mining Tool 3.0, along with Affymetrix LIMS Manager 3.0 , Affymetrix LIMS (LIMS Server Software) 3.0, and Affymetrix LIMS Development Server Software 3.0.

All current Affymetrix customers who have valid software maintenance agreements can receive the new upgrades free of charge. Any customer who has purchased the Affymetrix software within a year and has completed and returned the enclosed license reply cards has a current maintenance agreement. More information is included on the company’s website, www.affymetrix.com.

Affymetrix is also “building a huge database” that offers “a comprehensive view of the human genome,” on its NetAffx website, Awad said.

NetAffx, which the company introduced in July, allows users access to 100- to 600-base sequences containing the 25-mer oligonucleotide probes on its GeneChip arrays, and to conduct searches of multiple databases using these sequences. The site may include sequence and cluster visualization tools in the future. “This site is going to grow,” said Awad.

— MMJ

The Scan

Rise of B.1.617.2 in the UK

According to the Guardian, UK officials expect the B.1.617.2 variant to soon be the dominant version of SARS-CoV-2 there.

Anne Schuchat to Retire

Anne Schuchat is retiring after more than 30 years at the US Centers for Disease Control and Prevention, Politico reports.

US to Share More Vaccines

CNN reports that the US will share 20 million doses of the Moderna, Pfizer, and Johnson & Johnson SARS-CoV-2 vaccines with other countries.

PNAS Papers on Gene Therapy Platform, Aspergillus Metabolome, Undernutrition Model Microbiome

In PNAS this week: approach to deliver protein-based treatments to cells, pan-secondary metabolome of Aspergillus, and more.