By Steve Nadis
It was the middle of “A.I. winter” when Larry Hunter got his PhD in computer science in 1989. Most opportunities in artificial intelligence were for military applications, but “I liked the idea of trying to help people rather than kill them,” says Hunter, who instead opted for the nascent field of computational biology. He now directs the Center for Computational Pharmacology at the University of Colorado Health Sciences Center. His laboratory is on the forefront of applying digital tools to biomedical problems. Among its tasks is the analysis of data generated by the university’s gene expression array facility — one of the largest in the academic world.
Currently, chips at the Colorado facility monitor expression in more than 10,000 genes simultaneously. New arrays will accommodate whole human genomes.
“Biologists don’t want to look at 30,000 genes at once,” says Sonia Leach, a Brown University graduate student training in Hunter’s lab. “We use computational tricks to trim that down to the 10, 20, or 100 genes they’re really interested in.”
The techniques — machine-learning algorithms — attempt to automate the process of gene expression analysis. “This lab stands out because of its expertise in machine learning,” says Imran Shah, a computational biologist who collaborates with Hunter. The two have developed algorithms that classify proteins into families by looking for patterns in the arrangement of amino acids. These algorithms can improve, or learn, over time with exposure to more and more examples.
The software is similar to face-recognition programs that reduce a mug to a set of key features. In this case, says Shah, “we focus on interesting sub-regions in the protein sequence, thereby transforming a long list of amino acids into something more biologically relevant.”
The lab has also developed clustering techniques that identify co-regulated genes. Hundreds of genes are turned on, for example, to make ribosomes. When that happens, messenger RNA concentrations rise and fall “in lockstep,” says Hunter. “We don’t want to treat these changes as independent. The goal is to focus our attention on events that are rare, not common.”
To this end, his team has devised statistical filters that distinguish meaningful gene expression shifts from random ones.
Another homegrown approach called a similarity metric enables the group to analyze the results of one experiment in the light of related experiments. “The software sifts through the database to find an experiment close to yours, which can help you make connections you might not have thought of before,” says lab member Ron Taylor, a computer scientist.
The database of gene expression measurements now under construction can reveal the “background distribution” for a particular gene, helping investigators determine whether a change is unusual or not.
But the expression data come from various sources, including the university’s cancer center, clinical research center, and other labs. So database management is now a major priority. “The point is to get data into its most useful form, with common terminology, and do it automatically,” Hunter says.
To accomplish this, Hunter is working on creating a Colorado Center for Excellence in Bioinformatics that would coordinate knowledge on cancer, diabetes, heart disease, Down’s syndrome, and other areas of biomedical research. “If this comes together, the boundaries between my lab and the new center will become fuzzy,” he says. And with the wealth of information available in the database, Hunter adds, “I’ll have a much greater potential for directly benefiting people, which was my motivation for getting in this field in the first place.”