Watch out microarray analysis pipsqueaks: Here come the giants. Statistical analysis software heavyweights SAS and SPSS have both cast their tall shadows over the microarray data mining market, backed by organizational depth, capital, and reputations that few, if any, microarray-specific software providers can match.
SAS, which bills itself as the world’s largest privately-held software company, has formed a genomics group to leverage its statistical and data warehousing expertise to develop a genomics-oriented software package, said Russ Wolfinger, who is heading up the group as part of the company’s statistics and operations research. The system will incorporate microarray data, proteomics data, and metabolomic data.
Wolfinger described a vision of “a collaborative software system where statisticians and others can use our high-power tools to put together easy-to-use modules for their scientific colleagues.”
The Cary, NC-based company aims to price the software as a mid-tier product, less expensive than, say, Rosetta Resolver, but targeting a more statistically oriented audience than more microarray-specific systems such as Silicon Genetics’ GeneSpring and BioDiscovery’s GeneSight.
“What we can offer that is different than [existing microarray analysis packages] is the ability to handle complex experimental designs, and to offer warehousing in addition to analysis and web delivery,” said Wolfinger. “Most of our revenue comes from sectors like finance and insurance that are already handling terabytes of data.”
Even without being tailored for microarrays, the basic SAS package does have relational database capabilities as well as cluster analysis, principal components analysis, neural networks, and self organizing maps.
As a result, a number of its customers such as Monsanto, the Netherlands Institute of Applied Scientific Research, and the Siteman Cancer Center at Washington University St. Louis, are already using SAS software for microarray analysis.
William Shannon, the co-director of the multiplexed gene analysis core at the Siteman Center, recently told BioArray News that he uses SAS as the primary microarray data management and analysis tool: “As a professional statistician it is important that I be able to control the fine tuning of a statistical algorithm. SAS gives me that kind of control.”
The company, however, does not have much expertise dealing with biology — a weakness that Wolfinger admits. To address this potential liability, SAS has partnered up with scientists such as North Carolina State University geneticist Greg Gibson, who uses microarrays in his explorations of Drosophila genotypes and phenotypes. Gibson, Wolfinger, and several others recently published papers in the December 2001 issue of Nature Genetics and the November-December issue of the Journal of Computational Biology in which they used SAS in a complex experiment examining the interactions between age, sex, and genotype and changes in gene expression.
Wolfinger said that word is already starting to get out in the genomic analysis community of SAS’ venture into the field, and the company may “shoot up some [trial] balloons” during the first or second quarter of this year to further test the market.
SPSS Hopes Clementine Will Become Darling of Array Analysis
Meanwhile, rival software behemoth SPSS of Chicago has turned its trunk toward the microarray arena, as it discovers that increasing numbers of scientists are using its predictive modeling program Clementine for gene expression analysis. The company has designed other application-specific templates for Clementine, and may build one designed for microarrays.
But like SAS, SPSS is billing its software — without any added bells and whistles — as a more powerful alternative to microarray-specific packages.
“Clementine is a very flexible tool,” said Ken Kirsten, SPSS senior product manager for science. “You can very easily merge data from different sources, such as gene expression data, clinical data, and ADME tox data.”
Clementine also provides visualization of workflow, as well as normalization and filtering mechanisms, different modeling methods including clustering, decision trees, association rules, and neural networks. It also includes the CEMI, or Clementine External Module Interface, to enable the user to add other programs or techniques.
“There is not one standard [data modeling] technology to use for all microarray experiments,” said Kirsten, “so if you want to add your own clustering algorithm, fine.”
Another major feature that sets Clementine apart from microarray-specific data analysis programs, said Kirsten, is its capability to do predictive modeling, where users cannot only cluster data but add new data to make predictions based on the model they have created.
While a few groups have published papers using Clementine for microarray data analysis, last fall, SPSS decided to increase its presence in the bioinformatics and microarray analysis world by setting up a booth at the Critical Assessment of Techniques for Microarray Data Analysis conference at Duke University in October. Werner Dubitsky and his colleagues from the German Cancer Research Center in Heidelberg presented two papers on machine learning methods for microarray data analysis using Clementine to employ a decision tree algorithm in the data analysis. (Two other groups used SAS in their presentations).
Currently, SPSS has plans to step up its marketing efforts to the genomics community, with booths at BIO-IT World in March and other conferences later in the year.
Microarray Software Companies: What Me Worry?
With the entry of the statistics heavyweights into the ring, should microarray analysis software makers be recalculating their probabilities of success?
Alexander Kuklin, director of application sciences at BioDiscovery, thinks not.
“SPSS and SAS are fantastic tools that you can do anything with if you have the knowledge and training to program, said Kuklin. “But to understand [them] takes a lot of time. Many molecular biologists do not know statistics and have to generate and analyze data under pressure. This is where companies like BioDiscovery come in.”
Kuklin added that he thought microarray software and statistical programs could actually be complementary, if researchers use the former for their first-pass analysis, and then consult statisticians using the latter for more thorny problems.
Given the expanding microarray market, however, SPSS and SAS are not likely to be long satisfied with this status as a default option.
“SPSS is a major player in data mining and statistics,” said Kirsten. “It is not something that was written yesterday by a few grad students. We have the resources to continue to develop any software we put on the market.”