Bernhard Palsson is an in silico biology pacesetter who caught the systems biology bug well before the term was even coined, in his undergraduate research on the kinetics of heart enzymes in rats. “I realized that an enzyme isn’t a life — you have to get all of them together to figure out what they do together, to understand the living process, at last at the molecular level,” he said. He went on to publish the first dynamic model of a human cell — metabolism in a red blood cell — in 1989, and followed with the first genome-scale metabolic models of Haemophilus and E. coli in 1999 and 2000, respectively. Over the past couple of years, Palsson and his collaborators modeled the metabolism of Saccharomyces cerevisiae.
BioInform spoke to Palsson recently to get his take on the rapidly growing field of computational systems biology and where he sees the network-modeling world going over the next few years.
Can you describe your view of systems biology and the role of sub-cellular simulation in that process?
As a general comment, I would say that systems biology is not new. [It dates] back to 1960 or so when people discovered the first control circuits inside cells. What followed after that is that molecular biology really grew as a discipline, and in the ‘80s, many of the experimental methods that were used in the field started to be scaled up, and, of course, by the mid-’90s, this reached the genome scale. And as all this data started pouring out, the field of bioinformatics was born, but the field of bioinformatics … is now looking for a mechanistic basis for cellular functions. They describe their main challenge right now as integration of diverse data types, and that is, of course, what we call reconstruction.
So that’s one of the routes to where systems biology is today, the molecular biology route. People think of systems biology more at the molecular or cellular level, and I think it’s important to point that out, because physiology, for instance, has been characterized by the systems study of biology forever. So systems biology is a matter of scale — we’re looking at molecules and cells, not organs or whole bodies — and it’s a genome-enabled science, because now we have the sequences and can enumerate these components. It’s not enough to be looking at one lac operon, but at the whole cell.
There’s also a systems route, and that is that in the early 60s, people started simulating mathematically things like the lac operon … but it wasn’t until Venter put out that first sequence [of Haemophilus influenzae in 1995] that it became possible to develop genome-scale models. So that route to systems biology was basically systems analysis — lots of control theory, kinetic equations, kinetic theory. So systems biology basically has two historical routes, and the field now is really desperately trying to combine these two.
I think what will happen over the next five years or so is that we will probably be able to reconstruct fully transcriptional regulatory networks in E. coli or in Haemophilus or B. subtilis. Many of the companies [entering this field] are marketing dynamic models of signaling pathways, but the reality is that it will be many years before we can reconstruct those pathways. The experimental work we are doing suggests that we only know about 20-25 percent of the transcriptional regulatory network in E. coli, so 75 percent of it is yet to be discovered. I would say that if you really want to understand the way the system works, it’s probably premature to build models and read too much into them. The analogy I like to draw is with language. If you know one out of four words in a sentence, there’s no way you’re going to understand what that sentence is saying.
What is the role of experimental biology in reconstructing these networks?
It’s a misconception that biology is data-rich today. It’s not data poor the way it was, but that doesn’t mean it’s data rich. So I think biology will continue to be driven primarily by experiment. I know it’s fashionable to say — and I’ve written this myself — that there’s this tidal wave of data and we need to integrate and analyze it. That may be true, but there are many more tidal waves coming. So the role of experiments is very important, and therefore, this lab, for instance, focuses on three different things: one is experiments; number two is reconstruction of networks; and then we develop mathematical methods or in silico methods to analyze the properties of these reconstructions.
In early 2003, you published a paper describing the reconstructed metabolic network for yeast [Genome Research 13:244-253]. Then, in November, you described the use of that network for phenotype prediction [PNAS, 100: 13134-13139]. Can you discuss this significance of this progression of your research?
I would say in 2003, the field of modeling in systems biology, particularly at the genome scale, went from a retrospective analysis of data to driving experiments prospectively. We are doing that now in the yeast as well as the E. coli model. So there’s a very subtle but important implication of that with respect to modeling philosophy. If you earn a degree in physics or engineering, you use the models to describe what you want. Here, it may be the inverse: You’re trying to get the models to describe what you don’t know. And, in fact, the results of these models are more the failures than the successes. If you look at the recent yeast paper where we looked at 600-700 knockouts, and we failed to predict accurately in 100 cases, the results are really a case-by-case analysis of those 100, trying to figure out why we failed, and what experiments need to be done to determine whether we know why we failed. If course, if you can determine that, and build it into the model, then you no longer fail in that prediction.
What are you doing to improve the false prediction rate?
We are trying to understand how E. coli regulates its metabolism when oxygen is shifted, and we have used a model to calculate which are the most informative knockouts.
So we made knockout transcription factors, and are taking those knockout strains through the same shift. Then you look at the expression shift in the wild type and compare it to the knockouts, and can therefore decipher whether the transcription factor is involved in gene expression or not. That experimental data has now been matched against the transcriptional regulatory network that has been reconstructed, and has led to a list of 110 well-defined hypotheses, and they’re all amenable to testing. But the problem is that there are too many. If you wanted to go and do this by classical genetics and biochemistry, it would take you a long, long time. One of the lessons that we are learning from this first iteration of prospective experiments is that you now need high-throughput methods to validate or refute hypotheses. So one of the results is that we need more technology generation to generate the right kind of data.
You co-founded Genomatica several years ago. How does the work in your lab feed into Genomatica’s product pipeline?
The company holds licenses to quite a bit of IP out of this lab. I was intimately involved in developing the company, but over the last six to eight months, it [has] sort of moved beyond the phase where a founder is important for a company and is taking on its own management and its own lifetime and so forth. So I’m diverging from it in that sense.
If there’s more IP that will happen, I don’t know. I will say that if they get their patents issued as filed for, these patents will be of similar fundamental importance to the in silico biology field as Affymetrix’s [have been] to microarrays. The claims that have been filed for are pretty broad and fundamental, and without any prior art on file.
In sequence-based bioinformatics, companies didn’t really patent that much, and they also didn’t do very well commercially. Do you think that in silico modeling will have a better chance at commercial success if firms in this area rely more on patenting?
If you patent a useless thing it doesn’t really matter if you have a patent or not, so it’s really a question of whether there is useful content in these databases growing over time. And we can observe some history here. There are databases in organic chemistry that have been around for 15 years or more, and they seem to keep their value over time and, in fact, grow in value. Biological databases, on the other hand, seem to have a decaying value over time. Now, I would argue that a reconstruction of an organism will only grow in value over time because Salmonella is not going to go away, Staphylococcus aureus is not going to go away. Of course, there is only a subset of organisms that will have real commercial value, and in the bacterial space it will be the human pathogens, probably, and some of the bioprocessing organisms, and potentially some of the organisms of environmental importance. Computer models of the standard model organisms will probably have value in a research/academic market — E. coli, yeast, C. elegans, or Drosophila — but, of course, the models with the ultimate, gigantic market value are of human cells.
So how far are we from getting those models? For metabolism, it’s near-term — metabolic models of single human cells or interacting human cell types are achievable today. Is the same thing going to be true for signaling? That’s a question that everybody is asking given the fact that at least half of the drug targets being studied are in signaling. You probably won’t expect that to happen in the near term, because the reconstructions just aren’t there, but in ten years’ time you probably will have pretty good models of the major signaling networks in cells. So if you’re a VC, that would put you to sleep because that’s way too long. But for metabolism, if you give me the money today to do it, in two to three years we could have the models.
Are you working on modeling human cells now in your research?
In my lab, we are debating that a bit. We are building a model of human mitochondria because the data is becoming available for that right now, and as it turns out the mitochondria is involved in a rather large number of human diseases. But aside from mitochondria, I think we need to wait a little longer before we have the right data to do it.
What kind of data would be most helpful for you to have in order to advance your models?
One of the surprising features of genome-scale models is that the functional states of networks are much, much greater than the number of components. If you look at a small network with six to eight components, they may have two or three functional states, but there’s a transition somewhere around 30, 40, 50 components where the number of new functional states starts to grow much faster than the number of components you’re adding into it. So adding 10 new components may give you 500 new possible states. Of course, this is one reason why people are so shocked that a human can be created with 30,000 genes. So the data I’d like to be able to get now is the data that allows us to measure these functional states. And in metabolism, this comes down to what’s called fluxomics — to measure the flux distribution through a network. People seem to understand that best through traffic analogies. So if you look at a street map, having the map is not enough if you want to know what the functional state is: If a street is closed, is everything going to back up, or are there many alternative routes? Fluxomics is developing as a field, and people are now measuring states in core metabolic pathways in E. coli that are totally different than the pathways described in classical textbooks. Then, of course, we need to know how the regulatory system picks the functional states. It may have thousands of functional states available to it, but it may take, say, ten that are useful to it. And what’s useful, of course, is determined by the historical background of the organism — how it evolved. So we need to measure those states, and then try and figure out the regulatory structures that produce those states, and how robust those regulatory structures are.