University of Washington
David Baker, a computational biologist at the University of Washington in Seattle and a Howard Hughes Medical Institute investigator, is known for developing Rosetta, a suite of software for predicting the three-dimensional shapes of proteins.
Recently, he and his fellow researchers developed a refinement of Rosetta that they used to accurately predict the structure of a 112-residue globular protein using only its amino acid sequence — the first time this has been accomplished, according to the researchers.
The computational method, detailed in the Oct. 14 issue of Nature, addresses one of the biggest challenges in computational structure prediction — that of “interim structures” in which proteins get “stuck” in partially folded structures. In the article, Baker and colleagues describe an approach called "targeted rebuilding and refining," in which Rosetta identifies the regions most likely to lead to misleading interim structures and isolates them for "targeted rebuilding."
The method is also expected to improve experimental methods for predicting protein structures by addressing the so-called “crystallographic phase problem.” This occurs when researchers convert X-ray diffraction data into electron density maps of proteins and must infer the phases associated with each diffraction peak. Typically, crystallographers rely on structurally similar molecules to serve as templates for this inference step. In cases where there are no structurally similar molecules available, computational methods would prove a useful alternative, but have had limited success to date.
To power the study, more than 70,000 PC users around the world downloaded [email protected], a distributed version of Rosetta that is based on the Berkeley Open Infrastructure for Network Computing platform.
BioInform spoke to Baker this week by phone regarding his work, and the project's results, which continued to grow far beyond 70,000 users as of press time.
Tell me how you got 100,000 users involved in this.
We started off much smaller than that. … It's been growing gradually since it started. It says, according to the statistics here, that 186 new users joined yesterday. There are 165,468 now.
And so every day, people hear about it; they read articles … and they get interested in contributing the power of their computer to biomedical research.
How do these users power your efforts?
We are doing a lot of different things, but in all of them, they require finding a needle in a haystack. We are trying to find the correct needed structure of a protein or find a design that will inhibit a pathogen. We are always looking for very, very rare things and don't know exactly where to look, so basically, what happens is … when we are trying to solve a problem we will send out something like 100,000 or maybe even a million different starting points for different computers to look at and solve the problem.
Each computer is looking in a different place for this very rare thing, whether it's the correct structure of the protein or the best design to inhibit a pathogen or a DNA-cutting enzyme. So when we find the right answer, we know it, but it's very, very hard to find. And so basically, each computer looks in a different place. That's a simple way to put it.
And when was it you felt you had found the right answer?
Basically, the correct answer has the lowest energy and we can be confident that — well, we can't be totally confident ever without some independent data to validate it, but if many people are finding the same lowest-energy structure, if it appears multiple times and is significantly lower than the other structures that are found, then we can be pretty confident that that's the right one.
How do you [validate] your results?
In the paper, what we did was to show that … the lowest energy models we were finding were good enough to help people solve crystal structures really fast.
Now, 3D modeling for protein structure prediction — maybe not specifically this type of research, but 3D modeling in general — while attracting a lot of attention has also garnered a lot of skepticism. How do your results help to dispel that notion?
I would say the methods for predicting structure have been improving over the years. We published a paper in Science, I think in 2005, showing that you could get high accuracy predictions from these types of methods, high-accuracy models [Bradley, P, Misura, KM & Baker, D. (2005) Science 309, 1868–1871]. And I guess that was really the first statement that you could do this. And now with this paper, we've gone on to show that not only can we produce accurate models, but you can do things with them.
Can you describe for me, please, the crystallographic phase problem and why your protein structure prediction method will serve this task?
When you solve a structure by X-ray diffraction, you collect half of the information you need to solve the structure. You collect what are called the amplitudes of the diffraction, but to actually solve the structure you need also what are called the phases that go with those diffraction amplitudes and you can't measure those directly in a single experiment.
But if you have a model that's pretty close, then you can use that model to get … an estimate for the phases and solve the structure that way. So the model can provide the missing half of information you would need to solve the structure from a single crystallography experiment.
Now in practice, the phase problem is solved experimentally by making multiple crystals, getting crystals from multiple, slightly different versions of your protein and then from those multiple data sets you can solve the phase problem.
Can you describe for me how the [email protected] distributed computing project is set up and how it interacts with the BOINC platform?
[email protected] only was possible because of the generosity of David Anderson [research scientist with U.C. Berkeley Space Sciences Laboratory], who developed the Berkeley Open Infrastructure for Network Computing, BOINC. They developed the whole distributed computing software and mechanism for their [email protected] project and then David realized that … this kind of computing power could be useful for other scientific problems. He helped us a lot, gave us all the software that they had developed for distributed computing, so [email protected] is completely reliant on that.
Who developed Rosetta?
That was developed by my group, and worked on by many scientists. It started about 10 years ago, and since then many of the students and postdocs who've left my group and started their own academic groups have continued to work on Rosetta. So now it's really a community of scientists who are all over the world who are developing the software. And we have annual meetings, sort of reunions, every year. So now it's kind of neat because the postdocs who've now started their own faculty positions have all developed groups of their own. So they bring students and postdocs of theirs back. It's fun, and we get a lot of ideas on how to proceed in the work we are doing in all of these different groups.
Would you consider Rosetta your seminal work? Or have you had other developments in your group that match it?
I would have a hard time answering that. I am very happy about the community that's developed around the software and it's just great, I am absolutely delighted. ... We are trying to make it as widely used as possible. Academic people working for non-profits should be able to get the software for free, get the code for free and do whatever they want with it. And the way that it works is companies have to license the code and with those licenses, we support this annual meeting. That's what pays for people's plane tickets back to Seattle from Israel or wherever they happen to be.
What is next for you at the lab?
We have some really exciting results now, which hopefully you will be reading about before too long, on using Rosetta to design brand-new enzymes that catalyze reactions for which there are no naturally occurring catalysts, which I think can have a big impact in medicine and industry.
Do you have any guesstimates on the timeline?
We are submitting the papers [for publication] now. We already have the results.