Anyone who has walked through the exhibition hall of a recent genomics conference can testify that the offerings in microarray data analysis software can overwhelm the senses. Not only has just about every existing player in the increasingly crowded field rolled out a new release, Packard (now part of PerkinElmer) and startups such as Gene Network Systems, Molecular Mining, and Iobion Informatics, have introduced their own data analysis packages.
While a quick look at glossy product literature, with its clustering screen shots and data tables, gives the impression that these programs are interchangeable, the fine print reveals some key differences. Mainly, these differences center around five factors: enterprise-wide systems vs. desktop solutions, the number and kind of algorithms the system offers, the statistical analysis and visualization tools, the interface with other modules in the microarray process, and, finally, price.
At the recent IBC Chips to Hits conference in San Diego, and at this weeks Cambridge Healthtech Institute Microarray Data Analysis conference in Alexandria, Va., BioArray News met with most of the software providers in an effort to sort out the distinguishing features of each program, a summary of which follows, along with the accompanying chart on clustering algorithms and other tools the programs offer.
Rosetta Resolver v. 3.0: The Rolls Royce Package Gets Fancier
At the end of October, the Rosetta Biosoftware division of Merck introduced version 3.0 of Rosetta Resolver, its enterprise-wide microarray data management and analysis program. The company has licensed Resolver to an all-star cast of 16 pharmaceutical and biotechnology companies, and the system is tailor-made for such multi-user organizations. This is what sets Resolver apart from the pack, according to Rosetta Biosoftware vice president and general manager Doug Bassett.
Our customers turn to us when they need an enterprise solution to integrate multiple gene expression technologies with different varieties of data sets across research groups, and to get it all into one centralized scalable high performance enterprise architecture, Bassett said. A lot of competitors in our space are really focused on desktop solutions and only offer a database backend option. But we provide the core enterprise architecture.
The new version of Resolver adds some new killer apps to the mix of data analysis tools that the earlier versions included, and continues to allow users to custom-fit the program with their own analysis techniques, Bassett said. The first of these add-ons is a set of class prediction algorithms, which users can train to classify a particular gene expression pattern based on other known classes of data. For example, the user could train the algorithm with two sets of gene expression data, one from a group that responded to a treatment, and the other from non-responders. Then when an unknown gene expression data sample is introduced, the algorithm could successfully classify that sample according to the known patterns of responders or non-responders. The other new application is principal component analysis, which helps users winnow down their gene data sets from thousands to less than a hundred key genes.
Despite these cool features, the reason many smaller users give for not choosing Resolver is the price. Although the company does not disclose this information publicly, the list price for a standard enterprise system is said to be around $250,000, and users such as Bristol-Myers Squibb and GlaxoSmithKline are reportedly paying up to $1 million for enhanced versions. But Bassett said it is a misperception to think that Resolvers price is out of reach for smaller groups. You could buy a very small system with, say, two or four concurrent user licenses. (A concurrent license allows only one user to be on at a time, but does not limit the total number of users.) And every desktop within the organization could be a client and be able to access the server. While Bassett declined to name a price for this small system, he said it was nowhere near the $250,000 figure quoted, and would be competitive with others on the market.
GeneSpring v. 4.1: ForD or Flexi-Flyer?
Silicon Genetics, which has become Rosettas chief competitor (especially since Rosetta exec Deepak Thakkar left to become Silicon Genetics CEO), has just sprung version 4.1 of its popular GeneSpring software on the market. While GeneSpring may lack the luxury features of Resolver, the company boasts 3,000 to 4,000 users. Another critical difference, according to Thakkar: GeneSpring is a scalable and modular product. It can go from a desktop for a single scientist to a global enterprise framework.
With this scalability comes a smaller price tag: individual licensees can pay under $40,000 (although the license, which costs a fraction of that per year, must be renewed at full price every year).
The latest version of GeneSpring includes automated data loading, normalization functions, new visualization tools, statistical filters, and more advanced annotation and search functions. Additionally, GeneSpring contains most of the popular clustering algorithms, along with principal component analysis and pathway prediction. The company also shares its API with other companies to enable them to fit their algorithms to GeneSpring. While the only currently available add-in is ArrayMiner, from Belgian company Optimal Design, Thakkar said at this weeks Cambridge Healthtech Institute Microarray Data Analysis conference in Alexandria, Va., that at least two other groups have developed algorithms for GeneSpring and had approached him at the booth to obtain the API so they can make the algorithms fit with it.
Seeing the Difference in GeneSight
With a name like GeneSight, its easy to see how Marina del Ray, Calif.-based BioDiscoverys microarray data mining software might get confused with GeneSpring. And in many respects namely its desktop approach and its price GeneSight is not too much different from its homologously named counterpart.
So what sets GeneSight apart? Three things, according to Greg Moore, vice president of business development: The ease with which a user can import data using the programs AutoImport Wiz; an icon-based drag-and-drop feature for handing data transformation such as log transformations or custom methods that a user can save, and a very rich statistical program. The program does perform cluster analysis, but cluster analysis is only as useful if the data is verified statistically, said Moore.
Another feature that distinguishes GeneSight is price. With a list price of $7,995 for a one-year single-userlicense, and a yearly maintenance fee totalling about $1600, GeneSight sharply undercuts GeneSpring in the price department.
While this price tag positions GeneSight for the desktop user, BioDiscovery is including GeneSight as part of a new enterprise-wide package called GeneDirector, which it will release before the end of the year, Moore said. GeneDirector integrates the companys CloneTracker, ImaGene imaging software, and GeneSight under one umbrella and allows researchers to customize their laboratory management, Moore said. The system is in late-beta testing, and will be sold at a $120,000 list price, which includes licenses for all of the programs.
Molecular Mining Digs in with GeneLinker Gold v. 1.1
Undaunted by the apparent array of competition in the microarray data analysis market, Molecular Mining, a Kingston, Ontario-based startup, has released its flagship desktop product, GeneLinker Gold v. 1.1, this fall. Like GeneSight, it aims at ease of use and affordability. While this release offers common clustering algorithms and other similarity measures, it also includes the unique Jarvis-Patrick algorithm. This method, also known as mutual nearest neighbors, is a method that gained prominence in the cheminformatics world, as it was discovered that Jarvis-Patrick could get good intrinsic clusterings of large chemical structure libraries without making unwarranted assumptions that the data were globular, or normally-distributed, said Evan Steeg, CEO of Molecular Mining. Our scientists have found examples in gene expression data where Jarvis-Patrick identified clusters that reflected the true pathway interactions better than results from other clustering methods.
Molecular Mining also offers a deluxe version of its software, GeneLinker Platinum, which includes unique and proprietary advanced data mining and prediction technology, Steeg said. The companys strategy is not only to sell this software to users, but to perform analysis services for other companies, including existing partners Avalon Pharma and the National Institute of Environmental Health Sciences.
Packard (PerkinElmer) Moves from Hardware
Packard Bioscience of Meriden, Conn., entered the microarray analysis market this fall with ArrayInformatics, a microarray data management and analysis system. Even though it was formally acquired by PerkinElmer on Tuesday, Packard is pressing forward with the marketing of this package, and was promoting it this week at the CHI Microarray Analysis Conference. According to Packard representatives, PerkinElmer was particularly interested in this package, and plans to integrate the software along with Packards hardware and its NEN Micromax microarrays and reagents into a comprehensive microarray portfolio.
We are the only company that has a comprehensive approach to microarrays, said Michael Megginson, a Packard BioChips sales engineer. Researchers, he said, can simplify their lives if they only have to deal with one company, and one that values customer service.
In addition to marketing this analysis package as part of a unified microarray solution, the company also positions it as a start-to-finish data management and analysis program in contrast to programs like GeneSpring, which do not perform data management and storage functions.
GeneTraffic v. 1.0
Honks its Horn
But Packard is not the only player trying to get into the data management side of data analysis. Iobion Informatics of San Diego, which has just released the initial version of its GeneTraffic client-server software, also hopes to wedge into the crowded field with this type of product. Stephen Sharp, director of marketing for Iobion, drew what he called a road map to illustrate the role of this software in the data processing process. The first stop on the map is scanning, then the arrow moves to image analysis, then to GeneTraffic for data filtering, and finally to a data mining program like Spotfire or GeneSpring.
Why use this extra step? GeneTraffic is designed for storage, normalization, validation, and initial clustering analysis, the company said. This is a data management system built on a statistical package (the Statistical R-program), said Sharp. Is it as rich as GeneSpring or Spotfire is in visualization methods? No. But the underlying statistics are much richer. And other programs have no data management capabilities.
The program does include the common clustering algorithms, k-means and hierarchical clustering, as well as basic techniques to visualize these clusters, and the company plans to increase the visualization capabilities in the next release, due out early next year, Sharp said.
BioMine Tries to Be the Porsche of the Pack
Gene Network Systems, a small startup based in Ithaca, NY, has dared to step into the microarray market with a program, BioMine, designed by a Princeton University physics professor Vipul Periwal. Instead of serving up the traditional menu of clustering algorithms, the package offers a series of sporty innovations, techniques borrowed from statistical physics and quantum field theory, according to the company.
One of these novel techniques, the BiCluster algorithm, is designed to find direct correlations between samples and genes. Percolation clusters, which are somewhat similar to hierarchical clustering, use probability to calculate the connectivity of different data points. It then averages different configurations of points to create what the company calls an ensemble average tree. This algorithm is superior to hierarchical clustering because it does not fall into the trap of early false association that can happen with hierarchical clustering, the company said. Additionally, the program includes a super-paramagnetic clustering algorithm that associates a statistical mechanical system of spin variables with the genes input, and then uses distance or similarity to measure the interaction between different spins.
The program avoids k-means and hierarchical methods, according to the company, because k-means and hierarchical methods ... are inadequate for recognizing subtle correlations and at the same time inadequate in recognizing outliers.
Spotfire: Competition or Complementary?
In addition to the panoply of microarray-specific analysis packages recently introduced or updated, several companies have been marketing their general purpose data mining packages as superior solutions for microarrays.
Spotfire, of Cambridge, Mass., says that its DecisionSite application can offer microarray users unique ways to visualize their data. We offer the ability to do dynamic querying of data, the ability to do constant what if analysis what if variables were here and what if there, said Bill Ladd, Spotfires director of bioinformatics We see the relationship between variables if you make a record in one plot you see where it is on other plots. You can then move quickly between the hypothesis and whether the data supports the hypothesis.
Spotfire, he said, can be used in connection with microarray-specific programs.
The program costs $5,500 for an individual seat license, not including discounts for larger numbers of licenses, so it is not beyond the reach of companies wishing to combine a management-analysis program with a visualization package.
Spotfire, however, may soon be catching some competitive heat from OmniViz, a Maynard, Mass. bioinformatics company spun out of technology giant Battelle. The company, which displayed its eponymous software at the Chips to Hits meeting two weeks ago, offers three-dimensional views of gene clusters, as well as text clustering based on literature searches for genes, and the ability to overlay one view against the other. In addition to a selection of clustering methods, from k-means to hierarchical methods, the program also offers a unique Galaxy plot which allows users to analyze large datasets and identify the principal components of the dataset in a three-dimensional format. The company calls this the next step after principal component analysis.
This program, which is already in use by Johnson & Johnson and other pharmaceutical companies, is currently at release 1.611. After a period of use behind the firewalls of OmniViz large customers, version 2.5 is due to be released to the public at the end of the year. Given that this software is more complex than Spotfire, it will also cost more. The specific price is said to be in the five-figure range, but the company declined to offer specific figures.
Partek: Not Rolls, but
Partek, of St. Charles, Mo., has also geared its data mining program, Partek Pro, toward the microarray sector, and has recently introduced a new release, version 5.0, with a base price of $5,000 per user.
While Rosetta may be the Rolls Royce of the sector, Partek is more like Mitsubishi, with its wider ranging products and customer list. The company counts more than 50 institutions including pharmaceutical companies Pfizer, Merck, Bristol-Myers Squibb, and the National Institutes of Health, as its customers.
Partek CEO Thomas Downey sees the companys more general toolbox as an asset in the microarray field rather than a liability, because it incorporates rigorous statistical methods used in other sciences rather than simply gathering together a group of clustering algorithms.
Cluster analysis is not the hammer for every nail in microarray analysis. It is an exploratory tool and is insufficient by itself, Downey said. The question I see people asking is ëwhat genes are different and how sure am I of this? Partek uses the tried and true statistical analysis toolbox to answer this question.
In addition to classic normalization methods, the company has also developed approaches to data analysis that take into account the uniquely large number of data points in each sample, and small number of samples. (This is the reverse of traditional data sets such as clinical trials data, where there are many samples and few variables per sample.) Instead of using the traditional p-value .05, which would allow 500 false positives in a 10,000-spot array, Partek is using the false discovery rate metric, which would look at the number of false positives in a sample and try to reduce that number to a reasonable level.
Partek, like others, is still working on the statistical aspect of its program. Given that there is still a debate as to the number of replicates needed to make a data set robust, as well as the type of statistical tests to use, the future of this program, and the future of many microarray analysis programs, is still open-ended.