Skip to main content
Premium Trial:

Request an Annual Quote

TeraGrid Will Be Boon to Biologists Hungry For Compute Power and Storage Capacity

Premium

The Distributed Terascale Facility or “TeraGrid”—a multi-site supercomputing system to be built and operated with a recent $53 million award from the National Science Foundation—could benefit biological research to a greater degree than scientific fields that have relied on supercomputers for decades, according to some observers.

“Biocomputing is probably the fastest growing component of our program currently,” said Bob Borchers, division director of advanced computational infrastructure and research (ACIR) at the NSF. “It used to be all physics and chemistry, but now biology’s really coming in strong.”

Borchers estimated that between 25 and 30 percent of supercomputer time that the ACIR oversees is used for biological computing, with genomics, proteomics, and protein folding projects consistently at the top of the usage lists. “We expect this usage to grow significantly,” said Borchers.

In addition to biology’s growing dependence on high-performance computing, Borchers predicted that the architecture of the proposed DTF would also draw more biologists to the resource. “These new machines are based on commodity hardware and commodity software, so there’s a lot of computing experience in the biology community on this kind of architecture. We expect those people when their problems outgrow their home systems to move over to the TeraGrid,” said Borchers.

The DTF will offer more than 13.6 teraflops of computing power and store more than 450 trillion bytes of data. The TeraGrid infrastructure will link computers, visualization systems, and data at four sites — the National Center for Supercomputing Applications in Illinois, the San Diego Supercomputer Center in California, Argonne National Laboratory outside Chicago, and the California Institute of Technology in Pasadena — through a 40 gigabit-per-second optical network that will eventually be upgraded to 50-80 gigabits per second.

The partnership will work with IBM, Intel, and Qwest Communications to build the facility, along with Myricom, Oracle, and Sun Microsystems.

Linux clusters purchased through the DTF award and distributed across the four sites will total 11.6 teraflops of computing power. In addition, two one-teraflop Linux cluster systems already in use at NCSA will be integrated into the system, creating a 13.6-teraflop system that the project said is the most powerful distributed computing system ever.

In addition, ACIR program director Richard Hilderbrandt noted, “The TeraGrid is not just about high-performance computing. It’s also about databases, data storage, and large archival storage.”

Hilderbrandt said that researchers who use the Protein Data Bank and other biological databases housed at the SDSC would benefit from this increased storage capacity.

The SDSC’s already strong ties to biological computing and database maintenance are only expected to increase as a result of the DTF. Andrew McCammon, a professor of pharmacology at the University of California at San Diego who recently used the SDSC facility to successfully run a new method of modeling the electrostatic properties of cellular microtubules and ribosomes, said he’s looking forward to having access to enhanced computational capacity.

McCammon parallelized the Poisson-Boltzmann equation to divide the computation among 700 processors. While current methods of solving the equation have been limited to molecules of 50,000 atoms, McCammon was able to model a 1.25-million atom microtubule using his approach. Once the DTF comes on line, McCammon said, “Calculations involving tens of millions of atoms will likely be possible.”

And McCammon’s work (which recently appeared in the Proceedings of the National Academy of Sciences and will soon be freely available to the scientific community in a software implementation) is only one of many new computational biology methods that are being developed to scale with the number of processors in parallel computing.

“Together with fast methods that have been developed for calculating forces as well as electrostatic energies, these will open the way to simulations of the dynamic properties of such cellular components as DNA replication and transcription complexes,” said McCammon.

— BT

The Distributed Terascale Facility is expected to perform 11.6 trillion calculations per second and store more than 450 trillion bytes of data by April 2003. Each of the four contributing sites will play a unique role in the project:

National Center for Supercomputing Applications

• IBM Linux cluster

• Intel 64-bit Itanium McKinley processor

• Peak cluster performance of 6.1 teraflops

• Cluster will work with existing hardware to reach 8 teraflops

• 240 terabytes of secondary storage

 

Argonne National Laboratory

• IBM Linux cluster

• Peak performance of 1 teraflop

• Will host advanced software for distributed computing, high-resolution rendering, and remote visualization

California Institute of Technology

• IBM Itanium McKinley cluster

• Peak performance of 0.4 teraflop

• 32-node IA-32 cluster will manage 86 terabytes of online storage

 

San Diego Supercomputer Center

• IBM Linux cluster

• Intel’s McKinley processor

• Peak performance of 4 teraflops

• 225 terabytes of storage

• Sun high-end server will manage access to Grid-distributed data

Filed under