Good bioinformatics techniques are essential for interpreting genomic data no matter where you are, but for Andrew Pohorille, director of NASA’s Center for Computational Astrobiology, it’s even more important to glean as much knowledge as possible from raw data.
“You are much more forced to extract as much information as you can because repeating your experiment is not a simple thing,” Pohorille said of the gene expression studies that NASA has been conducting during recent space flights.
NASA is performing microarray experiments on kidney and bone cells as part of an effort to determine the physiological effects of long space flights. It has already been determined that there is a loss of bone mass in space, and gene expression studies have been targeted as one method by which to gain a better understanding of the process. Recent microarray studies NASA performed on kidney cells on earth and in space indicated that several hundred genes are expressed differently at statistically significant levels, Pohorille said.
A broad survey of what happens to humans and other organisms during space flights will guide NASA’s decisionmaking process as the space program develops. “If it turns out, for example, that gene expression in development is messed up, that might mean that we could travel in space but we cannot colonize space because new organisms cannot really develop properly,” Pohorille said.
But astronauts have limited time to devote to microarray experiments during space flights, so the NCCA has had to work with fairly sparse data sets. While much of the center’s bioinformatics efforts would be considered fairly conventional — comparing microarray data from space and earth isn’t all that different from comparing diseased and healthy tissue — other features of the program have a “specific NASA flavor,” according to Pohorille, in order to overcome the challenges of incomplete information.
In a collaboration with Jeff Shrager of the Carnegie Institution of Washington and Pat Langley from the Institute for the Study of Learning and Expertise, the NCCA is developing a qualitative modeling system called Biolingua to reconstruct metabolic and regulatory pathways using incomplete information.
Pohorille noted that while the NCCA may have a particularly acute problem, many bioinformatics efforts are faced with a similar situation. “In most cases, even under well-controlled experiments, you don’t have complete information. You have a lot of genes that are unidentified functionally or are misidentified and you very rarely have enough of an understanding of biological processes that you can assign an enzyme to every process in a metabolic pathway.”
Biolingua uses artificial intelligence techniques to model pathways based on abstract representations of biological systems. It can interpret microarray data and suggest revisions that users can then add to further improve the model.
The system’s qualitative approach is distinctive, said Langley, because most simulation models work with quantitative information. “We’re not against quantitative modeling, there’s just some times when you don’t have enough information to justify that and you still want to be able to draw some conclusions.”
Pohorille said models such as EcoCyc represent pathways concretely using information that is already understood. Biolingua, however, represents biological processes as abstract aggregates that are analogous to a generalized chemical reaction, “an input, and output, and something that happens in between.”
Using these generalized reactions and an incomplete data set, Biolingua can reconstruct a partial pathway and then generate all possible biochemical reactions that would effectively close the remaining gaps.
Shrager said that the system could scale much better than concrete models. “The beauty of abstraction is you can divide the system up into qualitative abstract descriptions and you can use those to compute the thing you need to compute and then go down to the details when you need to.”
Biolingua addresses what Langley sees as a need for additional formalization in biology. He described it as an interactive tool to aid biologists in interpreting data while retaining concepts familiar to them. Applications derived from machine learning approaches tend to represent knowledge in ways that are foreign to biologists, he said, while Biolingua would map directly to the terms used to describe biological models.
“Biolingua is simultaneously an attempt to capture in formal notation the kinds of models and hypotheses that biologists talk about in their scientific papers and to develop computational tools that can operate over those representations,” Langley said.
Shrager is currently applying the system to study photosynthetic pathways in cyanobacteria, but the developers said it should scale fairly easily to other species. It is of particular interest to NASA because it can be used to perform simulations on organisms in completely novel environments, such as those that may be encountered on other planets. However, the same techniques can be used to model the effects of extreme environments on earth, such as in hydrothermal vents.
“The approach matches our needs particularly, but I think it matches everybody’s needs,” said Pohorille. He suggested running a Biolingua simulation before performing a knockout experiment to check for alternative reactions that could bridge the gap caused by the deletion of the gene.
Pohorille, Shrager, and Langley have applied for a NASA grant for the Biolingua software package. The NCCA is currently working with around $150,000 in funding from the Fundamental Biology program at NASA as well as the agency’s discretionary funds.
Langley said that the rate of Biolingua’s development would depend on the level of funding as well as the amount of interest seen from others in the field. The package will be available for free to academic users at the project’s website (www.biolingua.org), which is set to launch in the next few weeks.