Hans Westerhoff, a professor of molecular cell physiology and mathematical biochemistry at the Free University of Amsterdam and the University of Amsterdam, leads a project called Silicon Cell (http://www.siliconcell.net) that is envisioned as a live database of models that can replicate cellular behavior. Westerhoff’s own research focuses on glycolysis, and specifically how the pathogen Trypanosoma brucei thrives on blood sugar when it resides in the human body. He envisions Silicon Cell, a collaborative effort with Jacky Snoep, a professor of biochemistry at South Africa’s University of Stellenbosch, as a resource for researchers working on similar projects to log in and perform in silico experiments of their own, changing inputs and constants regulating the metabolism and gene expression to calculate the cell’s appropriate response.
BioInform spoke to Westerhoff recently about his vision and goals for Silicon Cell.
Can you tell me a bit about the Silicon Cell project and where it fits into the broader spectrum of computational systems biology?
The philosophy behind Silicon Cell is about systems biology, where systems biology — or at least part of it — is defined as the science that studies the functional properties that arise in systems of molecules, or systems of organelles. And new properties come about by the interactions of the molecules or organelles (or organs). Well, if this is so, then if you know the capabilities of the molecules that interact — if you have measured that experimentally — then you can do the rest by computer. You just put all the behavior of the molecules, their individual behavior, in the computer, plus their ability to interact with other molecules, and you let the computer run a simulation of the actual behavior of the cell.
So basically then, what we’re talking about is a replica of the living cell. That’s why it’s called the Silicon Cell, even though that’s a bit ambitious. It’s really the ambition to have ultimately one computer model as a replica of the entire cell, or even of the entire organism, but we’re not there yet. So what we’re doing now, and what other groups are doing now, is accomplishing this for parts of a living cell, like a metabolic pathway or gene expression pathway or signal transduction pathway. But the philosophy is still the same: This is not modeling how it might behave approximately; this is the calculation of how it should behave because you put in the precise experimental properties of the molecules.
The extended philosophy of this is to put all these models are on a website (http://www.jjj.bio.vu.nl/) so that you can actually run them. You can click and see how the concentrations of the molecules behave as a function of time; you can also enhance the expression of a gene, the amount of a protein, and then see what happens. So you can sort of genetically engineer on the web, with a replica of the living cell.
What’s available on the website now?
You’ll now find a number of pathways — I think it’s maybe 20 now — and they come from a number of groups all over the world. They are based on refereed papers. There is also now a connection to two major journals in the field — the European Journal of Biochemistry and Microbiology. Just like for gene sequences, where they want people to submit to the nucleotide sequence databases, they want papers that submit models of processes of living cells to submit to this website so that the referee can play, so to speak, with the model on the web and see that it actually does what the authors claim it does.
That is also part of the initiative. And something that is the big aspiration now is the issue of [whether] we can link these models. The number of models is now increasing, so this becomes relevant to ask if we have a model, say, of mitochondrial metabolism and a model of glucose metabolism and both of them have something to do with pyruvate, can we now link the two models? That’s the ambition, so that ultimately this thing will indeed grow to a model of a living cell — a yeast cell, or an E. coli cell, or a human liver cell.
How do you envision these models being used?
A number of discoveries have already been made with this. For example, this was done for probably the best known organism, we thought: yeast, Saccharomyces cerevisiae, which has been studied for over 100 years. So the enzyme properties were available in the literature, but nobody stitched it together. So we put it together and ran it in this computer replica, and then it turned out that it didn’t work. What we obtained was a metabolic explosion, basically. It’s like [if you have] an interstate and it turned out that in some part of the interstate, say between New York and Washington, people could drive very, very fast because the road was very broad, and then between Washington and Chapel Hill, the road was a lot narrower. Consequently an enormous traffic jam developed around Washington. So you’ve got a buildup basically in the middle of the pathway, where too much of that substance was built up and was not taken away, and that would have exploded the cell. This, of course, was not seen in experimental practice, so then we found a new regulatory effect that serves as a brake on the beginning of this pathway so that it can’t speed — it can’t overrun itself. You can see that with this procedure you can better understand how the system behaves and why it behaves like this, what the functional sum of the molecular components is, and discover unexpected functions for components, like this brake.
Similarly, Barbara Bakker, Paul Michels, and I also found a drug target in Trypanosoma brucei, which causes sleeping sickness. By making this replica, we could figure out, okay, if we now hit that particular step [in the pathway], then the organism should die, and this was not a step that everybody worked on. The step that many people in the field work on turned out to be not so important. So there are a couple of discoveries that have already been made by doing this procedure.
So you’re not focusing on one particular cell type?
The principle is to work on all cells. Of course there’s a historical effect because there [are] a number of cells where at the moment it’s easier to do this work, because you need the molecular information, and that information is hard to have complete. So the particular organisms that have been used for a very long time already as model organisms, like yeast and E. coli — they are the organisms for which you now find the most entries in the database of models. But there is one on Xenopus laevis, which is the South African clawed frog, which is important for developmental biology, and there is a model for the human red blood cell. So the focus depends on where the progress is and has been. It’s a bit scattered, but it’s in principle all organisms.
There are a number of projects building cellular-scale models of E. coli already, so are you collaborating with those projects or is taking a different approach?
In the E. coli field, there is Bernhard Palsson’s initiative [at the University of California, San Diego], and although he says he models the whole cell, it’s a different type of modeling. Basically what he does, if you look at it as an interstate map of the United States, he measures how much traffic there is on every interstate. What we’re doing here is [like] making a replica of the behavior of those interstates — that is, we figure out why there is so much traffic on every interstate. What determines the traffic? Whether it’s the people who want to move between Washington and New York, or whether they want to move between Washington and Boston. It’s really the behavior of the living system that’s done in the replica. If you put it on a human scale, we need to know the motives of the humans — whether they want to go to Washington or Boston, and we need to know the motives of the enzymes — whether they want to make ATP or GTP, and at what rate. It goes a bit deeper, therefore. The other initiatives are fantastic also, but they are in that sense different.
Is most of the data being used to build these models from the literature, or do you also perform wet lab experiments to gather new data?
I would guess that 80 percent or more of these models come from a lab that is both wet and dry. These people have an interest experimentally in the model, so what they will do is also tie it in with the literature, because you don’t want to throw away the literature. So they will make an initial replica based on data in the literature, but they will find — and what we found in our examples —that in the literature, one enzyme has been measured at pH 6.5, and another has been measured at pH 8, and we know that the pH in the cell is 7.5. Then the data you get from the literature are useful, but not quite the right data. At that moment, the experiment needs to be redone in our lab or the lab of the group that makes the model, and then you find the precise value.
That’s not so much work because you can use the method that was published by the authors. You just have to slightly adjust it for the precise conditions.
Can you quantify the amount of data you have right now in these different pathways that are available?
One aspect of systems biology and genomics is enormous amounts of data, but one aspect of this [approach] is that we get an enormous curation of data, and an enormous reduction in the number [of data]. Because what is in the live database — or the modelbase, so to speak — is only those data that belong together, that are really relevant for the right acidity of the cell, for the right ionic strength, for the right temperature, for the right pressure. So, in practice, for every property, this means that maybe 1 out of 100 of all the known data of that property is relevant, so you get a reduction in the amount of data per data point by maybe a factor of 20 or 100. Of course, if you then do all the possible permutations, if you have 100 relevant data points, you get a reduction of 100 to the power of 100 for all possible combinations. So this has an enormous effect. Therefore, our data set is much smaller. In a sense, that is a negative because it doesn’t contain all the data, but it is also a positive because it contains all the useful data.
The data now for a typical model, and these models are typically around 10 reactions, and every reaction would have approximately 7 parameter values, so each model would have around 70 parameter values.
Many people say that there’s still not enough data to really build accurate models.
They are absolutely correct. That is also the reason why the Silicon Cell initiative has only replicas for 20 parts of living cells. If you say that a cell, say yeast, is 6,000 reactions, the yeast model [in Silicon Cell] is 15 — therefore it’s one out of 400 [reactions] in the yeast cell. Yes, that’s because so few kinetic data are available to sufficient quality. So they are right, but hopefully the effect is that many labs will now — rather than collect quasi, at random, whatever data they get — collect specifically the data that are needed to make these replicas of living cells. Because there is a lot of data we don’t need, this may make functional genomics more economical.
Do you have a specific timeline for what you expect to be available as part of Silicon Cell and when?
The first four models went live two years ago, and these are being added to on a day-to-day basis, but it’s not that quick. Every time a well-refereed model appears in a journal, it’s added to the database. As to the question of when we expect the first linkage of models to occur, the first one is actually being tried out by Jacky Snoep. That’s for two fairly small models now, just to try everything out, and it seems to be successful, but that’s partly because it’s a simple case. Another timeline is the submission of other groups to the database. Other groups have been submitting, so this is now working, but it should still be on the increase.
When will we have enough to do an entire cell? That’s probably never, but when will we have enough to do 30 percent of a cell? Maybe seven years from now for Saccharomyces cerevisae or E. coli — that sort of timeline. When do we do human beings? The first pathways are there, but that will probably be 20 years from now.
But of course, there is a great philosophy behind this. My dream is that there will be a standard model for human physiology in terms of its molecules, and that for every individual, say 20 years from now, we will determine the polymorphisms that we can put into the computer model. Then we shall have a first approximation of a computer model for every individual. Then, when you go to the doctor when you are sick, he or she will still perhaps conservatively decide in the same manner as now. Upon that decision, however, he or she or even you yourself will try it out on your computer replica and find that a higher or lower drug dosage or a different drug might be better for you.