What began as a challenge from Craig Venter, Celera Genomics’ president, that even the most powerful of today’s supercomputers do not meet the needs of the genomic era, has resulted in a three-way collaboration between Celera, Sandia National Laboratories, and Compaq Computer to build a supercomputer capable of 100 trillion operations per second by 2004.
The massively parallel configuration could contain over 20,000 Compaq Alpha processors. It will be eight times faster than the fastest supercomputer currently available and 80 times faster than Celera’s current supercomputer.
The partnership signals biology’s newfound influence in the high-performance computing market, which has traditionally served physics, engineering, and defense applications. Sandia’s involvement in the project is particularly indicative of the paradigm shift, as this is the first foray into the life sciences for the lab best known for using its supercomputing power to simulate nuclear explosions.
“Just three years ago, the computational needs of biology were thought to be minor and irrelevant to the computing industry. Today, biologists are setting the pace of development for the industry,” Venter said in a statement.
Sandia will collaborate with Compaq on the architecture of the computer design, which will be based on Compaq’s AlphaServer supercomputer family. “Our expectation is that we’ll be moving in some directions that allow higher performance and more scalability around the interconnect and system software,” said Ty Rabe, director of high-performance technical computing solutions at Compaq. He added that it’s still to early to know exactly what approach the team will take.
Sandia and Compaq will both work on the operating software for the supercomputer, while Sandia will cooperate with Celera on parallelizing algorithms that it will develop to run on the system. Rabe said that Compaq may modify or develop compilers for some of the algorithms. “The intent here is really to develop a system that’s optimized around the applications for this field,” Rabe said.
George Davidson, manager of evolutionary computing methods at Sandia, told BioInform that the lab would focus on ensuring the scalabilty of the system — its ability to run the same codes regardless of the number of processors it contains. Sandia will also work on balancing the processor speed, communication ability, and input/output of the architecture. “If you’re dealing with databases, as Celera’s work will involve, you want a balance with your I/O system,” Davidson said. “It doesn’t do any good to compute at lightning speed if your I/O system to your databases are so slow that they can’t keep up.”
Venter said at the formal agreement signing for the collaboration that the heightened processing power would be used to solve the biology’s next big problem — integrating genomic and proteomics information into a systematic understanding of human biological functions in order to better understand health and disease.
Under the terms of the agreement, Celera will pay Sandia approximately $30 million to $40 million over the next four years. Neither Sandia nor Celera will be obligated to buy the supercomputer. Compaq intends to market a commercial version of the scalable system at the end of the project.
Celera, Compaq, and Sandia ultimately intend to develop the supercomputer through the petacruncher level — 1,000 trillion operations per second. Although IBM’s Blue Gene supercomputer project, launched in 1999, aims to reach the same processing speed, Rabe said that the Compaq/Sandia computer will have broad applications beyond genomics, while Blue Gene is being designed to simulate protein folding.
“[IBM] is building something that’s extremely narrowly focused on one application,” Rabe said. “You can do that one job quite well but if you want to do anything even moderately different you’re kind of stuck. So we’re trying to do something that’s much more broadly useful for scientists in this field.”
Sandia’s Davidson said that the general-purpose capability of the computer would benefit other ongoing projects at the lab, and “American industry in general.”
“Being able to buy one of these affordable, scalable and balanced computers is going to be a very liberating thing for all of us,” Davidson said.
Anne Marie Deroualt, life sciences director of business development and marketing at IBM, countered that Blue Gene “is going to be applicable to the broadest possible range of problems.”
Rabe said that he expects biosciences to be the fastest growing segment of the market over the next several years.
Genomics is “a revolutionary field,” he said, “and it has become quite an important aspect of our strategy for high-performance computing.”
“I think it took a lot of people by surprise that biology was going to suddenly show up and be a major consumer of computing,” Davidson said, “and certainly Celera is one of the companies showing the way.”
Davidson compared the recent advances in genomic technology with the changes that massively parallel systems brought to computer science. “Throughout our business, there were people who were working with a small number of processors, just like there were people working with a small number of genes, and then suddenly this revolution dumped this capability to look at tens of thousands of processors, which is similar to tens of thousands of genes. It’s just a different mindset,” he said.
“We in the nuclear weapons industry felt for many years that nothing could be more complex than nuclear physics,” Paul Robinson, president of Sandia, said at the signing, “but nothing beats the complexity of biological sciences, the human genome, and challenges that are ahead.”
— BT and MMJ