Revolution Computing, a statistical software startup based in New Haven, Conn., earlier this month launched its flagship product, a commercial version of the open source statistical package R called RPro.
The firm joins another software vendor following a similar business model — Random Technologies, which last year launched its own commercial R package, also dubbed RPro at the time but now renamed RStat [BioInform 07-06-08].
Both firms are targeting their tools to bioinformatics and other life-science users, and other vertical markets that rely on statistical software.
Revolution’s customer base is currently around “40 percent life sciences, 40 percent finance, and 20 percent other industries like manufacturing and oil and gas,” said Richard Schultz, the company’s CEO. He said that growth has been “much more active of late” in the life-science area compared to other markets.
Both companies hope to fill a gap that exists between unsupported open source R packages, like the Bioconductor microarray analysis suite, and commercial, closed-source statistical packages like those from Insightful, which markets S-Plus.
“We have come full circle,” commented David Rocke, biostatistician at University of California, Davis, School of Medicine in an e-mail to BioInform.
“S-Plus was the commercial, supported version of S. Revolution Computing and others are now offering commercial, supported versions of R. The market will tell whether their value-added services can compete with the free version of R on the one hand and with S-Plus on the other,” Rocke said.
Revolution’s focus is on “parallelizing” R across multiple processors. The company’s first product, ParallelR, enabled users to run R code on multicore, multiprocessor, grid, and cluster systems in order to speed their analysis.
The new product, RPro, adds “enterprise-level” support to the open source version of R, including precompiled binaries and installers for Windows, Linux, and Solaris; audit logs; bug fixes for current and previous releases; and other features. RPro also enables R routines to execute in parallel on multiprocessor computers, the company said.
“Our end user is not typically a computer scientist, but a scientist using familiar tools like R and they haven’t had to learn a high-performance programming language to get the benefits of what we are offering,” Schultz said.
He added that he sees the firm as following in the footsteps of companies like Red Hat, which has built a successful commercial business around services for the open source Linux operating system.
The advantage of open source, Schultz said, is that “you have an active, distributed, and engaged thought leadership community throwing out algorithms and technologies all the time.” However, he noted, only partially in jest, that this is also the primary disadvantage of open source projects.
“There are a lot of moving parts” with open-source development, he said, which opens up an opportunity for firms like Revolution that can package these tools in user-friendly form.
Schultz estimated that there are around 2 million R users who rely on the suite for a range of statistical applications, including gene expression data analysis and clinical trials evaluation, so it is perhaps understandable that the firm sees a ripe commercial opportunity in that market.
Schultz said that the firm’s focus on parallelization should be particularly attractive to customers looking to gain higher performance from R.
“Open source R is only single-threaded; it runs on one core of one processor of one computer,” he said. That characteristic stays the same independent of computer cluster size at a given location and it also limits usage possibilities for R, he said.
“The market will tell whether their value-added services can compete with the free version of R on the one hand and with S-Plus on the other.”
“RPro uses high-performance math libraries to take advantage of multi-core platforms,” said Schultz. These replacements of the reference math libraries that come with the open source version of R are “implicit” to the code and otherwise invisible to the user, added his father Martin Schultz, who is the firm’s chief scientific officer. Schultz Senior was a long-time researcher in parallel computation at Scientific Computing Associates, a statistical software developer and consulting company.
As Revolution Computing outlined in a statement announcing the new product, the linear algebra libraries let RPro deliver performance enhancement for R analyses “without requiring script modifications,” allowing many matrix and vector operations to run as much as ten times faster on multi-core service management platforms.
Revolution counts Novartis, Pfizer, and the Yale Cancer Center among its life science customers. In January, the firm announced that it had parallelized and accelerated Pfizer’s predictive modeling package Caret, which stands for “classification and regression training.” The parallelized version, called CaretNWS, is now available via the Comprehensive R Archive Network.
In a statement, Max Kuhn, associate director of non-clinical statistics at Pfizer, said that “the ability to conduct large data analysis across multi-core processors represents a significant benefit for drug discovery and development.”
Richard Schultz explained that Revolution Computing’s customers are found both in industry and academia, and the firm offers “special academic pricing,” though he did not disclose further pricing details.
The company expects RPro to appeal even to high schools, he said, where advanced courses in statistics have become common. Statistics is much broader than one might assume, Martin Schultz said. “It’s extending and merging with what used to be called machine learning or data mining and also merging with bioinformatics.”
How R You Different?
But Revolution isn’t the only firm hoping to capitalize on the popularity of the open source statistical package.
Random Technologies was founded by Gregory Warnes, who is also assistant professor in the University of Rochester’s department of biostatistics and computational biology. Previously a pharmaceutical statistician at Pfizer, he noticed a need for a commercially supported version of R. Warnes is also a long-time R developer and has written 15 extension packages for R.
Originally Warnes said he worked with Revolution Computing but decided to start his own venture last year, which is still ramping up and has fewer than 20 employees. Located in Pittsford, NY, the firm offers statistical consulting, software development, validation, and training services.
Random’s customers, Warnes said, are in the pharmaceutical space, in finance, and educational institutions across the US and Western Europe.
Random launched RStat, which Warnes described as a “very similar product” to RPro, last year.
Whereas Revolution Computing has applied its parallel computing expertise to its products, Warnes said, “I have applied the direction of providing the software validation, qualification documents, and tools. I think that is the fundamental difference” between the two firms.
Both companies highlight the importance of validation of their R products. As Richard Schultz explained, Revolution tests reliability intensely before shipping software to its customers.
Since open source software may have some elements that might “be well-tested and others are not tested at all … having a production process behind that can be extremely beneficial,” he said, particularly for clients who are concerned with FDA regulatory compliance.
He added that validation is “a large problem and solving it effectively for top-tier pharma is difficult,” noting that his firm has “made some progress” in that regard.
Random’s Warnes agreed that validation for FDA requirements is a key issue for customers and said he is developing templates for users to accomplish their own validation tasks for RStat.
“As soon as you transition from the experimental use of R tools for research purposes to using the tools for FDA filings and addressing FDA questions, [pharmaceutical firms] hit this large wall, a documentation hurdle,” he said.
“The goal that I have is to take that wall down from being six man-weeks of work to being 20 hours worth of work,” Warnes said.
Backed by Intel
In January, Revolution received a Series A investment of an undisclosed amount from Intel Capital — support that Richard Schultz said should give the firm an advantage in the marketplace.
Intel Capital provided the funding under its Open Source Incubator Program, which “was created specifically to drive investments in open source projects and accelerate the adoption of open source on Intel platforms,” Intel said.
The chip giant has a longstanding involvement with the open source world, starting with an investment in Red Hat in 1998.
Intel was motivated, Schultz said, by Revolution’s focus on parallelization. “So there is some natural synergy from the high-performance computing aspect,” he said. “We help to optimize their chip performance.”
Revolution Computing has around 30 employees, including biostatisticians, software engineers, customer support personnel, and liaisons to the R community. The firm’s advisory board includes Apache Software founder Brian Behlendorf and VA Linux’s Larry Augustin and its board of directors includes officials from Intel and Red Hat.