Next week, Microsoft will begin offering evaluation versions of Windows Compute Cluster Server 2003 — the first version of the Windows operating system designed specifically for high-performance computing.
The product, which has just been released to OEM manufacturers and is expected to be generally available in August, signals the software giant's official entry into the high-performance computing market — and another step in an ongoing strategy to court the bioinformatics sector.
Kyril Faenov, director of high-performance computing at Microsoft, told BioInform that the company has identified life sciences as one of its primary markets for CCS 2003. "We're certainly viewing life sciences as the key pillar, together with engineering, as one of the largest markets," he said.
Microsoft said that CCS 2003 will be priced at $469 per node when it is broadly available in August, but added that prices will vary depending on license and volume.
The release follows the launch of Microsoft's BioIT Alliance, a network of industry partners that will work with the company to ensure that its productivity software tools meet the demands of biomedical research [BioInform 04-07-06]. Faenov said that the HPC group works "very closely with the team that's driving the BioIT Alliance."
Microsoft is "certainly viewing life sciences as the key pillar, together with engineering, as one of the largest markets."
Microsoft first signaled its interest in the HPC sector with a preview of CCS 2003 at the Supercomputing 2004 conference [BioInform 11-15-04] and followed it up with a "beta 2" release at Supercomputing 2005 [BioInform 11-21-05].
The company has maintained for several years that its HPC offering would play a large role in its strategy for targeting the discovery informatics market. Last March, a company official told BioInform that Microsoft had been "slow" in approaching life science computing, and said that the firm's "initial play" in that area would be "in the high-performance computing area to drive value through those areas through clustering" [BioInform 03-21-05].
The company's goal has been to design an operating system for cluster computing that combines performance with ease of use and integrates easily with other Microsoft tools.
According to several early adopters that BioInform spoke to, CCS 2003 lives up to these expectations, although none of these users had yet run performance benchmarks on the operating system against Linux or Unix clusters.
"We have not done any benchmarks — neither numeric nor empirical — but subjectively, [CCS 2003] stays out of the way when you're doing heavy computations, and that's the only thing that's important," said Michael Athanas, a principal at bioinformatics consulting firm the BioTeam.
Ron Elber, a professor in the department of computer science at Cornell University's Computational Biology Service Unit, noted that the system should be useful for end-user biologists.
"Even the wet lab these days, because of genomics, has become data-intensive in many respects," he said. "They need their own high-performance local computing facility, which is relatively small — not necessarily the fastest on Earth, but something that will include 20 or even 40 nodes and will be able to accept genomics data — and they need it in a way that is very transparent, and that is the advantage of what we see here."
Matt Wortman, director of computational biology and IT at the Genome Research Institute at the University of Cincinnati, agreed that the system should interest scientists that aren't necessarily HPC experts.
"I don't see this replacing Linux in the supercomputer centers — those guys know what they're doing," he said. "It's the folks that are just getting into HPC for the first time, and they're not computer people — it's those people who I think will benefit. So I don't see it replacing Linux in a lot of ways, but I see it opening up a new market."
Microsoft's Faenov said that the company is indeed targeting "departmental and group" deployments for the system — market segments with IT budgets in the range of $50 million to $250 million and under $50 million, respectively.
"It stays out of the way when you're doing heavy computations, and that's the only thing that's important."
Nevertheless, he said, it is important for Microsoft to demonstrate its supercomputing prowess in the highly competitive HPC market, and the firm is shooting for a spot on the next Top500 supercomputing ranking. Faenov did not disclose details of the Top500 run, but noted that all of its potential HPC customers "want to make sure that they have the headroom and scalability and that the vendor knows what it takes to build large-scale systems, and that's certainly where we'll have proof points to demonstrate that."
The operating system actually debuted on the Top500 list last November when a 660-processer Dell PowerEdge cluster at the Cornell Theory Center reached the No. 310 spot.
Where are the Apps?
One challenge for Microsoft in the bioinformatics HPC market is a dearth of life science applications that run on the new operating system.
Broad adoption of the platform in the life science market "is really going to require third-party applications to become available, or a nice open source bioinformatics package," Wortman said. "It's really going to depend on the availability of those applications."
However, he added that it's probably "just a matter of time" before these applications start emerging. "Part of the reason there are no applications is because [CCS 2003] hasn't been released yet, but I think that once it's been released — and because a lot of the bioinformatics programs are open source — I don't have any doubts that these things will appear quickly."
In fact, Elber's lab at Cornell's CBSU has begun porting a number of bioinformatics applications to the new platform that it intends to make publicly available. Several such applications, including P-Blast, P-HMMer, P-IPRSCAN, MrBayes, and MDIV, are already available for CCS 2003 via a web-based interface (http://cbsuapps.tc.cornell.edu/index.aspx).
Jaroslaw Pillardy, senior research associate at the CBSU, said that his team plans to upgrade InterProScan to the newest version in around two to three weeks, and to port LOOPP and MKPRF to CCS 2003 shortly.
In addition, the BioTeam has ported its iNquiry suite of open source bioinformatics analysis tools to CCS 2003. Athanas said that the package will be ready to ship when the operating system is broadly available.
Athanas said that porting iNquiry to the Windows platform was straightforward. "It's a very different platform than what I was used to, in terms of a Unix platform, but once I learned how to approach the platform, I found it to be very powerful in terms of development and ease of integration," he said.
The University of Cincinnati's Wortman agreed that it was "pretty easy" to port a Linux application to CCS 2003.
Wortman cited a number of other advantages of the Windows system. "We only use Linux and Unix for things that we have to," he said. "Our entire security and identity management is based on Windows, so it's kind of a pain to have these few very expensive complex machines that are not Windows in our environment."
In addition, he said, "the more heterogeneous my environment, the more support people I need. Linux HPC support people are expensive, and so far the computer cluster is just like any other part of our Windows domain. The guy who's managing it now doesn't have HPC experience, and that was key."
While Microsoft is going head-to-head with Unix and Linux in the HPC market — and will certainly find many entrenched Linux users in the academic bioinformatics sector — Faenov said that he has been pleasantly surprised to find that many academic researchers are actually "very agnostic in terms of their platform — they just want to get their research done."
Faenov acknowledged that there are many labs that "have access to graduate students that have computer science expertise and they can really spend the time building up a Linux cluster from scratch and make the necessary modifications and figure out which MPI stack to run and how to configure the job scheduler and what's the proper security model," but noted that "a lot of folks just want to get their science done, and it becomes a very attractive proposition for them to use a familiar environment and drop in a cluster they can get up and running pretty quickly with their existing skill set."
— Bernadette Toner ([email protected])