New York State expects biotechs and other businesses to benefit from a program it launched this week that will provide free access to a 100-teraflop IBM Blue Gene/L supercomputer housed at Rensselaer Polytechnic Institute.
Under the three-year program, the New York State Foundation for Science and Technology and Innovation, or NYSTAR, is offering nearly 150 million CPU-hours on the system. Businesses and academic researchers in the state can apply for an unspecified amount of free time on the machine.
“We’re providing entry-level use to the computer,” NYSTAR executive director Ed Reinfurt said in a conference call announcing the program. Once the CPU hours exceed an undisclosed “certain capacity,” users will be charged a fee. NYSTAR did not provide details on the fee structure.
“We’re trying to get people comfortable with the use of the computer and at some point in time, it will stop being free,” he said.
There will also be “different arrangements” with companies depending on whether the research is completely proprietary or if it involves an academic collaboration, Reinfurt said.
The system is already helping at least one life-science firm in the state — Ithaca-based computational systems biology firm Gene Network Sciences, which has had access to the RPI Blue Gene system since it was installed nearly a year ago.
“For us it has been fantastic,” Colin Hill, GNS president and CEO, told BioInform. The company is using the system for its proprietary technology called Reverse Engineering/Forward Simulation, or REFS, which performs in silico experiments and makes predictive simulations with massive datasets.
“We take experimental data that has been designed for the purposes of either target discovery or drug mechanism or efficacy discovery and use that with very large supercomputers to reverse engineer mechanisms from data,” he said.
The REFS technology is computationally hungry. It generates hundreds of thousands of models for each of these experiments, and then uses a global optimization approach to identify those models that best account for the patterns in the data.
Because of REFS’ intense computational requirements, access to the RPI system “is very attractive financially for us, especially since we don’t have to put a really big capital expense toward an asset like that, [which] depreciates so quickly and needs that kind of updating, care, and feeding,” he said.
Giving Businesses CPU Hours
RPI’s Blue Gene supercomputer has 2,000 processors per rack and a total of 16 racks, said John Kolb, RPI’s vice president for information services and technology as well as the institute’s chief information officer.
He said that the service is now operational and accessible through NYSERNet, a statewide high-performance networking backbone.
NYSERNet also includes a 2,000-processor AMD Opteron cluster, as well as several symmetric multiprocessing, or SMP, machines, which can be used in several different ways for different problems, Kolb said.
“We can plug into this as if it was electricity out of a socket.”
“We expect to get people on all different compute architectures,” he said. “As they get more proficient in this massively parallel environment, they will move up from the Opteron [cluster] to the Blue Gene,” he said.
Kolb said that the RPI site has a secure machine room, firewalls, and encryption in place. “Some companies might take additional precautions in how they ship data to and from the supercomputer,” he said.
Separately, RPI has applied for a $1 million NYSTAR grant to encourage usage and help manage access and job scheduling on Blue Gene, as well as help users scale their code and convert their datasets to run on the massively parallel supercomputer. The review process for this grant has not been completed.
The supercomputer, which operates at 80 teraflops with a peak capacity of 100 teraflops, is housed in the data center of RPI’s Computational Center for Nanotechnology Innovations, or CCNI.
The institute, NYSTAR, and IBM jointly financed the supercomputer’s $100 million cost. Unveiled last year, the supercomputer was initially touted as a system designed to advance nanotechnology, but with this week’s announcement, the focus appears to have broadened.
When it was inaugurated last year, 20 percent of Blue Gene’s capacity was reserved for state purposes, Reinfurt said. Now New York Governor Patterson has decided that share should be used for “economic-development purposes” to fuel business innovation in the state, he said.
A Comfortable Parallel World
GNS previously ran its algorithms on Linux clusters but the scale of computing has changed over time, Hill said. When he was a graduate student doing pathway modeling at Cornell University, “we would be super-fortunate to gain access to a 200- or 300-processor machine they had there, which was the fastest non-military machine at the time.”
Now, he said, he can access up to 30,000 processors at RPI on demand. “We can plug into this as if it was electricity out of a socket.”
He added that unlike other on-demand computing models, like the “cloud” offerings by Amazon, Yahoo, and Google that link many low-cost processors together for compute jobs, the IBM Blue Gene is integrated and allows his team to use algorithms that can “really take advantage of that high-speed inter-communication.”
GNS had access to a Blue Gene system directly through IBM before RPI had its machine, so the company’s researchers have gathered experience working in this massively parallel environment, Hill said. “We got to scale and [could] plug and play by the time we got access to the RPI machine,” he said.
“At the beginning we did have to adapt the software to make use of the very fast interconnects between the nodes in the Blue Gene,” he said. “But that adaptation was completed years ago, even before GNS ran simulations on the RPI machine.”
A Tough Backbone
In tight budgetary times and given New York’s fiscal environment, there is limited cash to offer economic incentives, Reinfurt said, so “it is exciting to have at your disposal the ability to connect” small and large businesses to a supercomputing facility.
Companies working in molecular modeling, for example, need to work through typical trial and error R&D cycles to accelerate their experiments. “We believe this has … tremendous potential to help companies that are in the nano-science, the nano-electronics field, the life sciences and biology, chemistry, and advanced materials,” he said.
In order to run a job on the machine, users first need apply for time through NYSTAR. Once approved, they need to be certified through RPI, which will then schedule the job. There is currently no backlog on the RPI Blue Gene system, Reinfurt said.
The high-performance computing capabilities represented by the New York State resource “will play an important role in the future of [New York’s] life-sciences industry,” according to Clinton Rubin, who chairs the department of biomedical engineering and directs the Center for Biotechnology at the State University of New York at Stony Brook.
Rubin told BioInform via e-mail that the system should “facilitate the interaction of academic and industry scientists across a broad range of disciplines to interact in solving complex computational problems that ultimately will lead to new diagnostics and therapeutics for the detection and treatment of disease.”
Rubin added that the resource will help New York “develop a unique IT infrastructure that will fuel the growth of NY's bioscience industry by training the next generation of workforce, and providing a significant competitive advantage in bringing new products to market.”