Last week was a big one for grid computing. Just days after ten leading information technology firms announced their adoption of the open source Globus Toolkit as a standard grid technology platform, MCNC, IBM, and over 60 organizations in the North Carolina Genomics and Bioinformatics Consortium said they are building the first computer grid to be wholly devoted to life sciences research.
The North Carolina Bioinformatics Grid will enable thousands of life sciences researchers in the state to share computing power, data storage, networking resources, and data.
In addition to linking the current compute resources of individual sites, the BioGrid will draw heavily on the North Carolina Supercomputing Center and the North Carolina Research and Education Network, which are both operated by MCNC, a non-profit IT and telecommunications corporation.
The NC BioGrid infrastructure will also include IBMs teraflop-scale eServer p690, and a storage area network based on IBMs Shark Enterprise Storage Server and Tivoli Storage Manager. The Shark configuration is expected to provide a storage capacity of more than a petabyte of data.
The IBM equipment will be used as part of the grid network, but will also store duplicate files of information contained at individual sites for security purposes.
Distributed Computing for Distributed Research
The North Carolina bioresearch community is very vibrant, but very distributed, said Thomas Dunning, vice president of high-performance computing and communications at MCNC.
Dunning said the NC BioGrid had its roots in the formation of the NCGBC, which has over 60 member organizations from the states public and private sector, including the University of North Carolina system, Duke University, GlaxoSmithKline, the Research Triangle Institute, SAS Institute, Biogen, and the National Institute of Environmental Health Sciences.
As MCNC was considering options for the consortiums information technology infrastructure, Dunning said grid technology soon made sense as the best way to tie all these resources and technology together.
MCNC is providing the startup funding for the three-year project through its endowment fund. Other collaborators are expected to contribute funding as the project progresses, Dunning said.
While the high-energy physics research community has employed grid technology for some years, the NC BioGrid is the first application of the approach to life sciences research. Steven Beckhardt, chief architect of IBM Life Sciences, said that the NC BioGrid is the first attempt to commoditize the vision of the grid and make it available for a broad range of applications.
The BioGrid is also unique in its application to both private and public research activities. High-energy physics research is carried out in the academic and government sectors, where researchers security concerns about competing groups having access to their data pale in comparison to the security fears of pharma and biotech.
But both Beckhardt and Dunning said that MCNC and IBM are working hard to secure the BioGrid using public and private key cryptography.
Beckhardt said the NC BioGrid is expected to be only the first in a nationwide network of grids that will be linked together much like the Internet today is a network of networks.
The Globus Bandwagon
Dunning said the first phase of the BioGrid project an eighteen-month test-bed phase would evaluate available grid technologies. In addition to the potential security dangers and the risk of lost data there is the concern about the delivery of the promise of grid technology, he said. The goal is to make it seamless so it can provide tremendous benefits to the researchers and they dont have to worry about IT and database management.
While the Globus Toolkit is under consideration, Dunning said the BioGrid collaborators have not committed themselves to adopting the platform, despite IBMs previous endorsement of the technology.
Most current grid efforts are already based on Globus, which effectively became the de facto standard after last weeks adoption by Compaq, Cray, Platform, SGI, Sun Microsystems, Hitachi, and NEC. IBM, Entropia, and Microsoft had previously committed to the Globus platform. We would gain tremendous advantages using the same base as other efforts, Dunning said. But does Globus do everything you need it to do? No.
The Globus Toolkit, developed by the Information Sciences Institute at the University of Southern California and Argonne National Laboratory, is an open source set of protocols, services, and tools to enable secure interoperation of grid systems.
The NC BioGrid team considers the platform a good starting point for its own infrastructure efforts, which will evolve as the project progresses, Dunning said.
The BioGrid is a multi-institutional activity based in the [NCGBC] consortium, Dunning stressed. Although IBM is a significant partner in the effort, they will not be our only partner, he said. The various sites to be linked by the BioGrid each have their own platforms, and the grid will be the unifying factor that ties all those things together.
BT