With a background in astrophysics, computational biologist Imran Shah isn’t afraid of big numbers or hairy computer problems. That’s good for a person who wants to figure out all the proteins in the universe.
Shah — a native of Pakistan and the first recipient of a PhD in computational biology at George Mason University — joined the University of Colorado faculty last year and launched a project to explore new techniques for discovering proteins and their functions using microbes such as E. coli and H. pylori as model organisms.
Shah draws on available databases of enzymes and biochemical reactions to construct a network that maps out metabolic pathways and possible connections between them. “If we’ve identified half the proteins and want to figure out what the other half is doing, this can help us fill in the gaps,” he says. Shah relies on a cluster of eight Linux-based computers (and government supercomputers when needed) to run machine-learning algorithms, written in LISP, that pore over thousands of chemical reactions and generate rules about how one substance can transform into another. The computer can generate hypotheses that might provide helpful clues for a wet-lab biologist about potential pathways or links between compounds, Shah says.
Some scientists are intrigued by this strategy, while others think Shah might be tackling more than he and his computers can handle. Shah has demonstrated the concept by retracing about 10 known metabolic pathways. By the end of the year, he hopes to find hints of new functions for known proteins. If he’s lucky, the discovery of new proteins will soon follow.
— Steve Nadis