Last week, the Computation Institute, a joint initiative between the University of Chicago and Argonne National Laboratory, announced that it had purchased a 150-teraflop, 18,000-core Cray XE6 supercomputer dubbed "Beagle" that it will use for computation, simulation, and data analysis in basic, translational, and clinical research projects.
The new system, named after the ship that carried Charles Darwin on his famous scientific voyage in 1831, was purchased with a $7 million grant from the National Center for Research Resources arm of the National Institutes of Health and will be installed at Argonne’s Theory and Computing Sciences building in December. It is expected to be in full production by the second quarter of 2011.
The Computation Institute already manages two life science clusters — one with 752 CPUs and one with 32 CPUs — for the University of Chicago's Initiative in Biomedical Informatics. However, several biomedical research projects at the university were "hindered by the lack of access to sufficient computing power," Ian Foster, director of the Computation Institute, told Bioinform. With the aim of moving these projects along faster, Foster's team applied for the grant in partnership with colleagues at Cray.
"The approach was to find a set of particularly compelling applications across the university that had particularly urgent requirements for computing," he explained. "Then find a computing platform that met those needs with reasonably good fit … and it will be shared among different groups … [that] will use it to varying degrees."
Beagle combines AMD multicore processors, Cray’s Gemini system interconnect, and 3D torus topology, and can scale to more than 1 million processor cores, according to the company.
Foster said that his team considered systems from five different vendors, as well as two hybrid systems, but selected the XE6 "because it met the different range of requirements that we had."
While a lot of supercomputers today are architecturally similar, he explained, they differ in details such as the types of microprocessors used and the extent to which the network supports the high-speed communications needed for computational modeling, for example.
As an illustration, Foster said that regular communication between nodes is "vital" for research such as modeling blood flow through the body, because "what happens in one part of the body influences what happens elsewhere."
Beagle has 1,500 nodes with 32 gigabytes of memory per node and a parallel file storage system containing, initially, 500 terabytes of shared storage across all nodes as well as a tape storage system for long-term storage. The platform has about six racks and covers about 96 square feet of floor space.
Although he could not name specific computer vendors, Foster said his team had considered "special-purpose systems" that provided high performance in some applications but were more difficult to program, as well as cluster-based systems, which he said were more "general purpose" and cheaper but proved to be less suitable for computationally intensive projects such as biological modeling.
Ahead of Beagle's arrival, the University of Chicago is providing training for its staff to navigate the "peculiarities" of the computer's software, Foster said. For example, Beagle contains specialized software to manage communication between processors and system-management software for governing usage priorities.
The university is also setting up new cooling systems and electrical power connections – it estimates that Beagle will have a maximum power use of 348 kilowatts. It is also reaching out to developers to ensure that their applications are Beagle-ready so that the system can be "heavily used as soon as it becomes generally available."
In addition, Foster said that he plans to hire additional staff, including a system operator and an application consultant. Cray will also be on hand to provide additional technical assistance.
Beagle's First Steps
Although life science projects will be given priority on Beagle, Foster said that there is a possibility for projects in other disciplines to use the platform once it goes into production.
So far, twelve NIH-funded biomedical research teams have been selected to put Beagle through its paces beginning on Feb. 12 next year, which coincides with the 202nd anniversary of Charles Darwin's birthday.
Selected projects include research into the prevention and treatment of cancer, improved management of burn victims, drug design, genetics and inherited disorders, and personalized medicine.
Projects had to be NIH-funded and have large-scale computing requirements “that could lead to rapid advances in sciences that wouldn’t otherwise be possible,” Foster said. Moving forward, while project selection won’t be limited to researchers at the University of Chicago, they will receive priority, according to Foster.
“Ultimately we want to be able to show that advanced computing can make a difference in the biomedical sciences and so if other projects come forward we can make room for them,” he said.
Computational methods are widely used in biomedical sciences, he noted, but not all projects require large-scale computers and thousands of processors “to move their science forward.” Going forward, these are the groups that Foster plans to reach out to.
The partners expect that the selected studies will provide computational scientists, high-performance computing consultants, and systems administrators "with a rich proving ground" in order to "finely tune" Beagle for a range of biomedical research applications.
“We are going to be working on how to optimize the various aspects of the Beagle software configuration to [meet] the needs of life sciences applications,” he explained.
This includes things like configuring the file system to optimize performance for life science application workflows.
“We are particularly going to be using this machine … to do more data-intensive computer applications,” he said. As a result, “we will be putting a lot of effort into understanding and optimizing the file system in the parallel input and output libraries.”
He also said that the team may need to tune some computational libraries to work well on the Cray system and that, over time, it will also explore opportunities to increase Beagle’s performance with hardware enhancements, such as graphical processing unit coprocessors.
About half of the selected projects are aimed at understanding biomedical systems by building computer models and the other half focus on analyzing large quantities of genomic data to try and identify patterns, Foster said.
For example, Kevin White, director of Institute for Genomics and Systems Biology at the University of Chicago and Argonne, is researching how transcriptional regulatory networks influence development of cancer.
He told BioInform via e-mail that his team plans to use Beagle "to run large-scale simulations of network topologies and the transfer of information through networks, using probabilistic models."
Bob Eisenberg, a professor in the department of molecular biophysics and physiology at Rush University, is studying ion channels, which he said are “as important in living systems as transistors are in computers.”
Eisenberg's team has found "that the equations of transistors work well in ion channels if they are modified to include the finite size of ions," he explained via e-mail to BioInform. "In fact, the ions of channels form a plasma of very high density in which crowded ions are balanced between electrostatic forces and steric repulsion. Ions cannot overlap; they are hard balls that repel each other with steric forces."
He said that "Beagle will be used to compute these large balanced forces using a variational principle of great power … to solve a variety of biological problems, from current flow in the calcium channel of the heart, to signaling in the axons of our nervous system, to seemingly unrelated problems at the cellular and tissue level."
Other projects include identifying novel microRNA targets for cancer therapy and computer-aided diagnosis of breast, lung, colon, and prostate cancers using imaging biomarkers.
Beagle is expected to be ranked among the top 50 fastest supercomputers in the world and to be one of the fastest systems for the life sciences, Foster said.
Currently that honor for life sciences-focused supercomputers belongs to a 97.1-teraflop, 18,176-core HP cluster at Pacific Northwest National Laboratory’s Environmental Molecular Sciences Laboratory which holds the No. 57 spot in the latest version of the Top 500 computer ranking, which was released in June (BI 06/04/2010). The next version of the list will be released at the 2010 Supercomputing Conference in mid-November.
By comparing Beagle’s speed with those in the current Top 500 list, Foster speculates the system will rank somewhere between the University of Tennessee’s Athena, a Cray XT4 QuadCore 2.3 gigahertz computer that clocked in at 165 teraflops for the No. 36 spot on the June list; and the Japan Agency for Marine-Earth Science and Technology’s Earth Simulator, an NEC SX-9/E/1280M160 vector system that is capable of about 131 teraflops per second, which is No. 37 in the current list.
While there aren’t any immediate plans to connect Beagle to TeraGrid, Foster said that since many users run their data on multiple supercomputers at different times, he “suspects that we will run many of the services that TeraGrid uses,” which will make it easier to move data between TeraGrid nodes and Beagle.