As further evidence of high-performance computing’s central role in the quickly developing proteomics market, Sun Microsystems announced a partnership last week with Oxford GlycoSciences to provide the processing and storage capabilities behind its proteomics research.
The news follows swiftly behind GeneProt’s announcement that it’s recently launched proteomics facility in Geneva is supported by a Compaq system, which it says is the world’s most powerful commercial supercomputer.
In January, MDS Proteomics and IBM entered a strategic partnership that included the installation of three superclusters of IBM’s Eserver systems running Linux and Unix. MDS will also use IBM’s DB2 database system and DiscoveryLink data integration software under the alliance. The 700 gigaflop cluster will process data generated from mass spectrometers located in North America and Europe.
Sun’s OGS installation is based on a Sun Enterprise 10000 server connected to a 10 terabyte storage area network of StorEdge A5200 arrays. There are 12 fiber channel arrays, each containing twenty-two 36 gigabyte drives. Sun said the entire system offers 50 terabytes of backup capacity using a StorEdge L700 tape library.
Steve Perrenod, group manager for high-performance computing at Sun, said that OGS’s storage requirements were a key factor in the design of the system. “The ability to be able to take a lot of that information and put into a central memory and share those very large databases is critical here,” he said. The system’s 64 processors each have equal access to the central memory repository, he said.
Andrew Lyall, IT director and CIO of OGS, said he expects the firm’s 15 terabytes of online storage and 50 terabytes of nearline storage to “grow dramatically over the next few years as we do more deals with people.” OGS currently has proteomics research agreements with Pfizer, Bayer, Merck, Medarex, and GlaxoSmithKline.
The massive storage requirement drives the need for a scalable server, Perrenod added, “so you need a balanced system that is really able to handle both.”
“Our core proteomics capability is run seven days a week so we needed systems that have very high availability. The E 10000 is well known for having high availability because it has multiple domains and you can fail over from one to the other. You can carry on running with the failure of any one component,” said Andrew Lyall, IT director and CIO of OGS.
OGS is a core member of Sun’s Informatics Advisory Council, which was established in September 2000 to help address life science computing challenges. Sia Zadeh, group manager of Sun’s life sciences division, said that storage was among the top IT challenges the IAC identified for proteomics research.
Sun’s other proteomics partners include the Protein Data Bank, which is housed on Sun servers at the San Diego Supercomputer Center.
The OGS architecture partnership builds on Sun’s IAC partnership with OGS, Zadeh said. Unlike Compaq, which took a $10 million equity investment in GeneProt as part of its supercomputer placement, and IBM, which took a $10 million equity investment in MDS, Zadeh noted that Sun’s high-performance computing strategy is partner-neutral.
Financial terms of the Sun/OGS partnership were not disclosed, though Lyall noted that Sun offered a “very good price.”
Lyall said that in addition to the massive storage requirements of proteomics research, an additional IT challenge is integrating proteomics data with genomics data. He said that OGS is developing an in-house system to do this.
While other proteomics companies tend to just measure the mass of proteins, Lyall said, OGS uses tandem mass spectrometry to sequence the peptides, which it can then assign to a particular portion on the genome.
Zadeh said that Sun sees proteomics as a key growth area since researchers are “just scratching the surface” of the amount of information to be gained from proteomics technology.
“This explosion in proteomics is just getting started and I’m sure the computational and storage demands are a hundred to a thousand times what they have been in genomics identification and mapping,” said Perrenod.