From protein databases to new analytical methods for expression data to advanced computational modeling, systems biology relies heavily on the tools of computer science to derive functional networks from huge amounts of biological data. At MIT’s CSBi Symposium on Systems Biology Jan. 8-9, researchers were eager to share some of the informatics methods they have developed to support their broader systems biology research efforts. A sampling of some of the tools follows:
Neil Kelleher from the University of Illinois said that his lab has created a database of biologically possible post-translational modifications called ProsightPTM (https://prosightptm.scs.uiuc.edu/). Built with data from Kelleher’s “top down” approach to proteomics, which characterizes intact proteins rather than peptides, the web-based system includes more than 250,000 known and predicted protein forms for eight organisms.
MIT’s Catherine Drennan discussed how her group is using crystallography to study the function of the protein complex carbon monoxide dehydrogenase/acetyl-coenzyme A synthase (CODH/ACS), which the anaerobe Moorella thermoacetica uses to process carbon monoxide and carbon dioxide. Specifically, Drennan’s team was trying to determine how a single CO molecule appeared to travel from one active site on the complex to another, over a distance of 70 Å — quite an odyssey, in biomolecular terms. Turning to the CaveNV module of the CCP4 (Collaborative Computing Project Number 4) software package (http://www.ccp4.ac.uk/main.html), which calculates cavities in macromolecular structures, they were able to find a “channel” that ran through the complex to quickly transport the molecule, which they later validated experimentally.
MIT’s Tommi Jaakola described how he is using computational modeling to guide experimental design, and “help indicate what experiments need to be done to find the missing data.” Jaakola said that the models he has built are able to predict the effects of knockout experiments fairly reliably, and is able to reduce the number of experiments required by about half.
Todd Golub of the Broad Institute described a new project his lab has launched to create a “connectivity map” that will combine multiple microarray experiments in a single view “to explain the effects of multiple perturbations,” such as RNAi silencing or small molecules. In a pilot experiment, he said, his lab mapped a set of 22,000 genes from tissue samples that were untreated, treated with vehicle, treated with three separate anti-diabetic drugs, or treated with an anti-epileptic drug. Golub said he is planning to make the connectivity map publicly available via the Broad Institute’s cancer genomics page (http://www.broad.mit.edu/cancer/).
Steve Wiley of the Biomolecular Systems Initiative at Pacific Northwest National Laboratory described a “computational cell environment” that scientists at PNNL are developing to serve as an infrastructure for systems biology. The collaborative environment integrates data, permits users to share information, and offers a workbench of analytical tools for sequencing, gene and protein expression, and networks. Wiley said PNNL researchers are also using Starlight (http://starlight.pnl.gov/), a visualization platform originally developed for the US intelligence community to identify terrorist threats, to elicit relationships between disparate sets of biological data.