Microsoft unveiled its plans for entering the high-performance computing market at Supercomputing 2005 last week with the second beta release of its Windows Server operating system designed expressly for clustered computing.
The release of the system, officially known as the beta 2 version of Windows Compute Cluster Server 2003, also signals a new era in Microsoft's strategy for targeting the discovery informatics market — a sector that the company has been flirting with since 2001, when it first formed its healthcare and life sciences group [BioInform 05-07-01], but has yet to win over.
The software giant faces some resistance from a bioinformatics community that has historically been wary of Windows-based cluster computing, but company officials are confident that the new platform will win over most skeptics.
"Something that we've been missing in the life sciences is a real entrée into the drug-discovery realm," Les Jordan, industry technology strategist for healthcare and life sciences at Microsoft, told BioInform last week. "It's not something that we've been really focused on previously, but with this application, and the ability to make the application more seamless, more user friendly … we have a real good entrée into the market," he said.
Jordan added that within the pharmaceutical industry, Microsoft has "traditionally been pretty deep in the drug-development and sales and marketing realms, but in terms of the whole value chain of life sciences, where we haven't focused, and where we really haven't had much of a play, frankly, is in the discovery area."
"Something that we've been missing in the life sciences is a real entrée into the drug-discovery realm."
In March, Microsoft launched its so-called Digital Pharma initiative, a "solutions framework" that relies on open standards, web services, and products from Microsoft and its partner companies to help pharmaceutical firms collaborate and share information. [BioInform 03-21-05]. At the time, Paul Mattes, a Microsoft sales and industry strategist, told BioInform that Digital Pharma would eventually address IT challenges across four segments of the pharma pipeline — drug discovery, drug development, supply chain and manufacturing, and sales and marketing — but that the company had identified only drug development and sales and marketing for the initial stage of the effort.
One reason for this, Mattes said in March, was that the company's presence in discovery depended heavily on an HPC offering — a domain that he admitted the company had been "slow" in approaching.
Jordan noted last week that Microsoft now has a number of activities related to discovery informatics in addition to its HPC platform, including several bioinformatics projects within Microsoft Research as well as a nascent effort to create a "bio-IT alliance" of software developers, instrumentation vendors, and end users who will share best practices. "So I think the confluence of events that you're seeing — all under the umbrella of Digital Pharma — is really allowing us to move into that area. And HPC is just a great way to do that," he said.
While the company is clearly serious about selling its HPC platform into the scientific market — Bill Gates delivered a keynote entitled "The Role of Computing in the Sciences" at the conference last week — the system has been a long time coming.
A year ago, at Supercomputing 2004, Microsoft first demonstrated a version of Windows Server for clustering and said that the life sciences was one of three markets that it was targeting for the system [BioInform 11-15-04]. The first beta version of the platform was released in September and is currently being tested by 1,600 organizations, while beta 2 is "the first truly public beta where we are feature-complete," said John Borozan, group product manager for the Windows Server division.
A production version for the system is slated for release in the "first half" of 2006. Company officials were not able to provide a more specific launch date. Nor were they able to provide pricing information.
The HPC 'Appliance'
Borozan described Microsoft's overall HPC strategy as "PC economics catching up with supercomputing." As the cost of compute clusters built with off-the-shelf components continues to fall, Microsoft expects to see an "explosion of HPC clusters," Borozon said, with the quickest growth in the so-called "departmental" and "workgroup" market segments with IT budgets in the range of $50 million to $250 million and under $50 million, respectively.
But even with falling hardware costs, "deploying and managing a cluster today remains incredibly hard," Borozan said. "In a typical cluster deployment today, the user has to pull together a bunch of disparate components from multiple sources — so they've got to get their hardware and their interconnect and they've got to get their OS and they've got to get an MPI [message passing interface] software layer to coordinate the communication between nodes and then they need to have some kind of a job scheduler from a third party — they don't have one source that supports them, and in particular if they go pure open source without going through a vendor where they've paid for support, then they're kind of on their own."
Borozan said that Microsoft's goal with Windows Compute Cluster Server 2003 was to create a "complete integrated stack" of these components that would be easier — and cheaper — for users to install. Another expected advantage of the system, he said, is that it will be interoperable with the ubiquitous Windows desktop.
"I've been quite pleased with the comparative performance of things like Blast on the [Windows] platform compared to Linux."
The Cluster Server software "is very well equipped for building an appliance model of computing," said Michael Athanas, founding partner of bioinformatics consulting firm the BioTeam, which has ported its iNquiry suite of 150 bioinformatics applications to the platform.
Borozan said that the company wants "to make compute clusters a networked and shared resource the way printers are today," and noted that some of the firm's HPC partners, such as the MathWorks, are already adapting their software to support this so-called appliance model. "In Matlab, for example, you go to the file menu and you choose 'compute,' and it finds the compute cluster on the network and sends the job that way and sends it back to you."
This setup is expected to appeal to biologists who are struggling with growing data-analysis tasks, Borozan said. "Today what a lot of them find is that they have to become computer scientists on the side to be able to put together a cluster and develop the code that they want to run on it, and then sort of go back to their day job of being biologists or chemists."
Putting Windows to the Test
Eric Schadt, senior scientific director for genetics research at Merck and a beta-tester for the platform, said that deployment of a Windows cluster was "relatively painless, given what the Linux world takes."
Schadt said that his research team currently relies on a 700-node cluster of IBM blades running Linux to support a suite of algorithms developed in-house for network reconstruction. The problem, he said, "is that all of our prototyping and development takes place under Windows," which they then have to port to the Linux cluster, "and then all those results that we get in Linux, we're pulling back to Windows and storing them in SQL Server, so there's this iterative process that takes much longer than it should because we're moving between these two environments."
The "key motivator" for giving Windows a try, he said, "was to see if we could get the Windows cluster environment going … and see whether that could cut down on the time we spent moving between platforms."
Schadt noted that it's too early to answer that question definitively because his team is still porting its applications to a 40-processer cluster of 64-bit machines running Windows Compute Cluster Server 2003.
Schadt said his team intends to test the cross-platform interoperability of the system once it is up and running to determine if the Windows cluster could share computing jobs with the Linux cluster. "If it facilitates the grid computing paradigm, then that would be a major win for us, so that's something we're very eager to explore," he said.
The Merck team is also "trying to get Microsoft excited about trying to port R to the 64-bit platform," he said.
Athanas of the BioTeam was hesitant to evaluate the performance of the Windows system, noting that it is still a beta release. "However, I've been quite pleased with the comparative performance of things like Blast on the [Windows] platform compared to Linux," he said. "I don't think the operating system is a barrier to performance."
Borozan said that Microsoft expects the performance of the system to be "on par with Linux and a Linux-based stack. We don't look at this as our core value proposition in this space to say, 'Look, we're twice as fast as the other guys,' but being as fast is important, because nobody wants to make a compromise in that area."
The advantage that Microsoft does expect to have over competing platforms, Borozan said, "is ease of deployment, ease of management, and those other things that are currently the headache."
Will People Buy it?
But the company does face some hurdles as it tries to establish a foothold in the life science HPC market, which has strong roots in Unix and Linux.
Borozan acknowledged that the life science market has embraced Linux "to a much larger extent" than other vertical markets — a situation he attributed to the fact that "this space grew out of academia, where Unix was already very common and Linux was very common, and where a 'no-cost' operating system is going to have a natural appeal to some folks."
Nevertheless, he claimed that the total cost of ownership for the Microsoft system will be much lower than other platforms, "because a big part of that cost goes into the ongoing care and feeding of the system and the management and so on."
The company has not yet published a TCO analysis for the platform, however. "To do one now would be sort of premature because it would really only be focused on the application costs and not much else," Borozan said. He added that in an internal study to determine how long it took to deploy a four-node cluster on Linux and Windows, "our finding was very, very encouraging."
Microsoft also has some catching up to do on the applications side. At the Supercomputing conference, the company announced that it is working with 19 commercial partners and 10 academic centers to ensure that plenty of scientific applications run on the system once it is released. "In life sciences in particular, the ISVs and applications are much more fragmented than they are in, say, manufacturing," Borozan said, noting that the company is relying on partnerships with the BioTeam, Accelrys, and the Cornell Theory Center to help give it a "head start" in porting life science applications to the platform.
Another challenge, BioTeam's Athanas said, "is going to be how to accommodate those who are comfortable in Unix — how to make those Unix-type people feel comfortable enough to take a risk using this platform."
Athanas noted that "if you're stuck in a Unix mindset, it could be a little bit challenging, but if you're open to the paradigm that Microsoft introduces in its development environment, which is different from Unix, there's a whole lot of tools and capabilities for rapid application development."
The bioinformatics community has been skeptical about Microsoft's chances for gaining a foothold in the cluster market, and Athanas said that many in the field have expressed doubts since the firm first announced its intentions a year ago. Nevertheless, he said, "I've spoken to several colleagues in the clustering and bioinformatics arena, and we're all in surprising agreement that this as an excellent option and direction for Microsoft — an option for us, and a really good direction for Microsoft."
— Bernadette Toner ([email protected])