NIH’s Advanced Biomedical Computing Center throws a little bit of everything at genomics computation
The Advanced Biomedical Computing Center in Frederick, Md., has a computing system as diverse as its customers. The center supports the research of all of the National Institutes of Health as well as several subscribing nonprofit groups. Of its 1,800 registered users worldwide, an average of 100 may be logged on at any time.
Each user works on a different genomics or proteomics project optimized for a different platform, says Stanley Burt, the center’s director. “Our people are doing research that would take 50 years without fast computing. But one type of architecture doesn’t serve all. We match the science to the platform,” he says.
“We don’t have as many CPUs as many other labs. Instead, we have lots of different platforms that have different features,” Burt adds. “And we don’t have a single job running on two different machines together.”
Most recently, ABCC acquired a second Sun Enterprise 3500 server that will be dedicated to managing high-throughput sequence analysis data with Geospiza’s Finch suite. Using this application, scientists will be able to submit samples for molecular sequencing and access archived sequences in many formats.
The center’s original Sun 3500 runs other applications for managing microarray data. Both servers have two CPUs running at 450 megahertz each, two gigabytes of RAM, and more than 300 gigabytes of fiber-accessible disk storage.
In addition, the center has two Sun Sparc workstations, each with two Sparc 20 CPUs — one with 258 megabytes memory and 17 gigabytes storage, the other with 224 megabytes memory and 22 gigabytes of storage.
Molecular modeling, proteomics, and bioinformatics tasks are run on an IBM Scalable PowerParallel Superscalar Computer, boasting 16 QUAD SMP Power3 375-megahertz nodes with 26 gigabytes of main memory and 36 gigabytes disk storage each, and 80 gigabytes of RAID storage. The computer runs the IBM AIX 4.3.3 operating system.
Large scalar applications run on an Alpha 8400 Superscalar Computer running the Digital Unix 4.0D operating system. The box features eight processors that each churn at a rate of 625 megahertz, eight gigabytes of main memory, and 120 gigabytes of storage.
For RNA folding, proteomics, and genomics, ABCC has several Silicon Graphics machines running the Irix 6.5 operating system. These include a Power Challenge Superscalar Computer, with eight CPUs running at 195 megahertz, 20 gigabytes of storage, and two gigabytes of main memory; an Origin 2000, with 64 CPUs running at 195 megahertz and 250 megahertz, 80 gigabytes of storage, and 15 gigabytes of main memory; and an Onyx with four 195 megahertz CPUs, two gigabytes of main memory, and 40 gigabytes of disk storage.
Molecular dynamics, quantum chemistry, and whole genome comparisons are performed on Cray powerhouses: an SV1-4/96-96 Supercomputer, a J916/8-256 Classic Vector Computer, and a J932/ 161024se Vector Computer, all running the UNICOS 10.0 operating system. The SV1 can perform 115 billion calculations per second using 96 processors clustered together in four nodes, two with 32 and two with 16 processors; the hardware boasts 96 gigabytes of aggregate memory and 1.12 terabytes of fiber channel disk storage. The Classic Vector has eight CPUs running at 200 megaflops; two gigabytes of main memory, and 72 gigabytes of disk storage. The Vector contains 16 CPUs running at 400 megaflops, eight gigabytes of main memory, and 160 gigabytes of fiber channel disk storage.
Central storage resides on a Hewlett-Packard/Convex Exemplar UniTree Data Archival Computer, and a StorageTek 9311 Nearline Robotic Tape Silo. The HP machine has 108 gigabytes disk storage, 512 megabytes of main memory, and four PA-Risc 7200 CPUs; it runs the UniTree 3.0 HSM system and the SPP-UX 4.2 operating system. The tape silo includes 6,000 3490E tapes, and can record 360 tapes in an hour.
“With so many heterogeneous operating systems, and so many machines geared toward specific applications, the greatest challenge is keeping all the machines up and talking to one another,” says Burt, who declines to reveal the price of the machines — nor any other expenditures at the government-owned facility.
To keep it all going, ABCC has eight system administrators, one primary administrator responsible for each architecture type, plus backup staff trained in several of them. And to ensure that all the hardware works well together, the center runs several different load-balancing programs, including PBS for the SGI machines, Load Leveler for IBM, and NQS, and NQE for the Crays. Home-grown applications assign tasks to the various machines, based on software type, load on the machine, length of the job, and memory required. Customized middleware enables all the different types of hardware and software to talk to each other. Front-ending all of these applications is an Apache user interface that scientists use to log on over the Web.
On top of all of this, ABCC runs hundreds of software applications in hundreds of areas. The facility helps users write applications, assisting with algorithm optimization via normal debuggers, compiler directives and code analysis, along with Rational Rose for software design and coding.
With all of this technology in place, “we have enough capacity for our users’ needs,” Burt says. “Our problem is one of oversubscription — we have too many people trying to use the facilities at once. But it’s never one person wanting too much for themselves. Our systems are very reliable. The main problem we have is with scheduled downtime for maintenance, which we schedule for off-hours, after midnight.”
— Jackie Cohen