IBM is building a 7.5-teraflop computing cluster for Atlanta-based NuTec Sciences, a move that will give IBM a “distinct advantage” in the genomics marketplace, a company spokesman said.
The deal represents IBM’s latest step to position itself as a major player in the genomics market. In August, the company announced an initiative to invest $100 million in the development of IT solutions for processing genomic data. Since then, the company’s life sciences business unit has also formed technology partnerships with Incyte Genomics, First Genetic Trust, and Structural Bioinformatics.
IBM spokeswoman Theo Chisholm said last week that the company also intends to augment its initial life sciences investment by up to $100 million over the course of the year.
As part of the partnership, NuTec is also developing a “seamless informatics” system for IBM, a software package that will allow users to integrate genetic data with medical records data. Jamie Coffen, a spokesman from IBM’s life sciences business unit, said that the software should be available in May.
Coffen said the deal would give IBM a “distinct advantage in the marketplace” due to NuTec’s existing relationships with the National Institutes of Health and the National Human Genome Research Institute to co-develop algorithms to process genetic data.
The NuTec computing cluster will have a processing capacity of 7.5 trillion calculations per second, making it the fastest non-governmental system, IBM said. Within the cluster, 1,250 IBM eServer p640 devices will run IBM’s DB2 Universal Database, supported by 2.5 terabytes of memory, 50 terabytes of online disk storage and a high-bandwidth networking infrastructure.
The cluster system will include IBM software for Web application serving, information portals, and data integration. NuTec will use the system to manage, mine, and integrate genetic data from a wide variety of sources, and share this information via the Internet with the global life sciences community.
NuTec plans to run several massively parallel applications on the cluster. A combinatorics algorithm that NuTec is developing in collaboration with the NIH to analyze disease-causing gene combinations is particularly compute-intensive, according to Peter Morrissey, president of NuTec’s life sciences division. This algorithm is running as a test set on the company’s IBM computer in Houston, but Morrissey said they’re awaiting delivery of the supercomputer before it can be scaled up to optimal efficiency.
NuTec plans to rent the machine to a client base of academic research centers, biotech, and large pharma. Clients can use NuTec’s algorithms or they can use the facility to run proprietary compute-intensive datasets in a secure environment.
The pricing schema for these capabilities has not yet been determined.
NuTec expects delivery of the first quarter of the machine by the end of the calendar year, with phased delivery over the following six to nine months. Morrissey said the first phase should be functional within 30-40 days, with 1.8 teraflops up and running by mid-February.
Morrissey said that NuTec benchmarked a number of hardware systems in its supercomputer center in Houston, including Sun Microsystems, Hitachi, Compaq, and Linux clusters, and determined that IBM offered the most efficient platform based on performance and cost for NuTec’s applications. He declined to offer any details.
Some market watchers speculated that IBM, whose computers are historically among the more expensive, might have cut NuTec a deal in order to secure a stronger foothold in the genomics sector.
Coffen denied that the company is playing catch-up within the sector. “I think that the genomics market is just taking off right now,” he said. “The Human Genome Project was an important step, but genomics is just a first step in this process. There’s proteomics and other areas that you have to move into now to understand how these complex diseases take place in the body.”
He thinks that the informatics software that NuTec is developing will be a key component of IBM’s future market position. “A big problem in all the medical centers right now is taking advantage of the large amounts of clinical data along with the huge amounts of genetic data that we’re getting now, and being able to link those together. This problem is really the next frontier of biological science,” Coffen said.
—Bernadette Toner