Bioinformatics at the Cornell Theory Center is hitting its stride now, following a $160 million collaboration announced in July between the CTC, Cornell University, Weill Medical College, the Memorial Sloan-Kettering Cancer Center, and Rockefeller University.
The partnership, funded in part by a gift from a private donor, included support for high-performance computing resources and staff at CTC and a dozen new joint faculty positions, including Ron Elber, a Cornell professor of computer science and the lead researcher for the initiative’s target area in computational biology.
The center is also funded by Cornell University, New York State, and over 75 corporate partners, universities, and government agencies who participate in the CTC’s Advanced Cluster Computing Consortium.
CTC currently operates a complex of Windows 2000 HPC clusters for its community of users, who span a range of disciplines. Its primary cluster system, Velocity, provides 64 four-processor nodes for parallel computation. In addition, the 128-processor Velocity+ is dedicated to high-end simulations, including protein folding. Several smaller clusters provide resources for systems research and other activities.
David Lifka, associate director of the CTC, said that the center is growing at a rate of one terabyte every six months. The center supports over 650 users on the Cornell campus alone and also maintains clusters for a number of corporate, government, and academic partners.
The United States Department of Agriculture’s Center for Agricultural Bioinformatics is housed at the CTC, for example, and recently added a terabyte of disk space to its 48-processor Dell/Windows cluster there. The USDA Plant Genome Informatics unit at the CTC supports the largest cluster of plant genome databases in the country (including GrainGenes, RiceGenes, SolGEnes, CabbagePatch, RoseDB, and RiceBlastDB).
In addition to the USDA cluster, the National Institutes of Health’s National Center for Research Resources supports the Center for Computational Biology Solutions at the CTC, with a research emphasis on protein folding, protein dynamics, and bioinformatics.
Headed by Elber, the NIH Parallel Processing Resource for Biomedical Scientists recently released free computational biology tools (www.tc.cornell.edu/reports/NIH/ resource/CompBiologyTools).
Elber said that one of the more popular tools in the package is the LOOPP (learning, observing, and outputting protein patterns) protein family prediction server, where users can submit sequences via e-mail and receive a response about the relevant protein family.
“The response was a little bit overwhelming,” said Elber of the server. “At the beginning we started on a PC, but we had to scale it up very quickly and put it on the resources at the theory center.” The LOOPP server (ser-loopp.tc.cornell. edu/loopp.html) currently gets around 6,000 requests a week.
Elber’s group is also working on a large-scale linear programming problem, in which they train their energy function on sets of known proteins and sequences so that the correct sequences will always find the correct functions. So far they’ve built a database of around 50 million false positives.
“This is a very large-scale calculation that we are able to do thanks to the theory center’s high performance computing facility,” Elber said. “By all standards, I think this is one of the largest problems that’s ever been solved.”
The biology group recently boosted its computational power at the CTC by a factor of 10, and is looking for another factor of 10 in another month, Elber said.
“They’ve outrun everything with this system,” said Lifka of the computational biology group.
The CTC is currently seeking a bioinformatics project leader to head up a new bioinformatics unit as part of the $160 million initiative.