When distributed computing emerged as a viable high-performance computing option in the late 1990s, a number of vendors pegged the life sciences sector as a prime candidate for whole-hearted adoption of the approach.
While those early predictions may have been overly optimistic — many of those vendors are no longer in business — a number of IT managers in pharma and academia are finding that the approach does indeed offer a number of benefits over clusters and other computing options, provided that users are willing to grapple with a few unresolved issues.
Distributed computing, which harnesses the unused computational power of desktop PCs and falls under the broader umbrella of so-called grid computing, is making inroads at several large pharmaceutical companies, including Novartis, Johnson & Johnson, GlaxoSmithKline, and Sanofi-Aventis, which are all using United Devices' Grid MP platform.
Novartis entered a pilot project with UD in 2002 in which the company's software was installed on 1,400 desktop PCs [BioInform 09-02-02], and has since expanded the platform to around 3,000 PCs across five sites worldwide. Manuel Peitsch, global head of systems biology at the Novartis Institutes of Biomedical Research, told BioInform this week that the company is primarily using the platform for in silico docking experiments, and that it is considering extending it into other applications, such as text-mining and computational systems biology.
"I think that software companies really have to rethink their licensing models for the cluster or grid environment."
In silico docking is particularly well-suited to the distributed architecture, Peitsch said, because it requires a great deal of compute power, but "you don't need to have inter-processor communication between the various CPUs in the process." Any so-called "embarrassingly parallel" application would be able take advantage of this capability, he said.
UD has outlasted former rivals such as Parabon and Entropia to emerge as the leading distributed computing vendor, although it still faces some competition from Platform Computing, which sells a desktop PC grid product called Platform LSF Desktop Support as part of its flagship LSF cluster-management software family.
On its website, Platform lists Bristol-Myers Squibb, the European Bioinformatics Institute, Incyte, Monsanto, the State University of New York, and the Wellcome Trust Sanger Institute among its life science customers for LSF, though it was not clear how many of these customers are using the company's software to harvest spare desktop cycles.
A Platform spokesperson could not be reached for comment in time for publication.
In addition to its pharma clients, UD counts a number of academic research groups among its life science customers, including the Children's Memorial Research Center, the American Diabetes Association, Oxford University, and the Japan Biological Information Research Center. Several of these academic groups have been involved in large-scale community research projects using the company's grid.org portal, which is powered by more than 3 million PCs worldwide.
Eric Bremer, director of brain tumor research at Children's Memorial Research Center, said that he opted for UD's Grid MP as a low-cost alternative to a cluster when his lab began a text-mining project to extract information about genes associated with pediatric brain tumors from more than 125,000 articles from 21 journals.
Bremer said that a grid of around 20 to 30 desktops was able to reduce the time for a single search cycle from 24 hours on a single PC to less than an hour.
Based on the success of that project, Bremer said that he would like to scale up to around 100 PCs in order to expand the scope of the text-mining effort to "a couple hundred" journals over a fifteen-year span. However, he said, a potential "roadblock" exists in the form of inflexible third-party software licensing models.
CMRC is using software from two vendors in the project: LexiQuest Mine, a text-mining application from SPSS, and GetItRight, a data-migration program from CTH Technologies.
Since text-mining is a computationally intensive process, SPSS had some incentive to work with CMRC and UD on working out a suitable licensing arrangement, Bremer said. CTH, on the other hand, required the institute to purchase a separate license for each machine in the grid — a model that will be financially unfeasible for the institute if it tries to scale up to 100 PCs.
"I think that software companies really have to rethink their licensing models for the cluster or grid environment," Bremer said. "This is a roadblock that makes it unusable if they're going to insist on that."
Bremer said that a better model for his lab's budget would be based on a per-use or per-application basis "as opposed to per-processor."
Novartis' Peitsch noted that third-party software licensing issues are still a problem for distributed computing and that the company intends to address this hurdle in the future by sticking to its own in-house software. "We're primarily focusing on the stuff we own, where we own the code and everything," he said.
Inflexible licensing models threaten to eliminate the cost savings of desktop harvesting, Peitsch noted. "It's basically a cost issue," he said, "and the licensing model hasn't evolved in line with grid technology yet."
Peter Shenkin, vice president of software development at computational chemistry firm Schrodinger, agreed that "the licensing model is indeed a great impediment" to broader adoption of distributed computing, but he said that this issue isn't as much of a problem in the life sciences as it is in other HPC-heavy industries, such as aerospace and oil and gas, where most software packages are "tied to the box."
Shenkin said that Schrodinger offers a "floating license" that allows up to 100 CPUs to run its software simultaneously. While this model "isn't perfect," he said, "it's the best suited out of any of the commonly available licensing models" to grid computing.
A better option would be a "utility-based model" that would allow users to pay as they go, Shenkin said, though he noted that small software shops like Shrodinger "can't work like the phone company," making this model unlikely in the foreseeable future.
While licensing issues "have never been a limiting factor" for Schrodinger's customers who are using distributed systems, Shenkin noted that some of the company's pharma customers have experienced unforeseen technical glitches with their distributed architectures. For example, he said, depending on how many PCs a company has in a distributed system, the submission of a compute job could cause "a burst of traffic" that slows down the entire network.
In one case, he said, "a large company" was running Schrodinger's software on a distributed system, and users were reporting that their desktops slowed down considerably every time a job began, even though vendors such as UD and Platform claim that these applications should run unobtrusively in the background. Shenkin said that the company traced the problem to the antivirus software it had installed on all its PCs, which was checking "megabytes of input" every time the distributed application started up, slowing down the machines.
Despite these occasional glitches, Shenkin said that Schrodinger is seeing increased interest in distributed computing architectures from pharmaceutical customers who are running both UD and Platform. This interest has picked up "significantly" in the last year, he said, because the company has only recently begun offering grid-enabled versions of its software on the same release schedule as its software for other platforms.
Shenkin, Peitsch, and Bremer will all be speaking at UD's annual user conference in Austin, Texas, March 28-29.
— Bernadette Toner ([email protected])