University of Oxford
Name: Hagan Bayley
Position: Professor of chemical biology, University of Oxford, since 2003
Experience and Education:
— Professor and head of department of medical biochemistry and genetics, Texas A&M University System Health Science Center, and professor of chemistry, Texas A&M University, 1997-2003
— Associate professor of biochemistry and molecular biology, later also physiology,
University of Massachusetts Medical Center, 1991-1996
— Senior scientist, then principal scientist, Worcester Foundation, 1988-1996
— Associate professor, center for neurobiology and behavior, Columbia University, 1987-1988
— Assistant investigator, Howard Hughes Medical Institute, Columbia University, 1985-1988
— Assistant professor of biochemistry, Columbia University, 1981-1984
— PhD in chemistry, Harvard University (group of J.R. Knowles), 1979
— BA in chemistry, University of Oxford, 1974
Hagan Bayley, a professor of chemical biology at the University of Oxford, has been studying protein nanopores for more than 20 years. In 2003, he returned to his alma mater after almost 30 years of research in the United States.
Two years later, he and Reza Gadhiri from the Scripps Research Institute won a five-year, $4.2 million grant from the National Human Genome Research Institute’s $1,000 Genome
program to work on single-molecule DNA sequencing using engineered protein nanopores.
Also in 2005, he founded Oxford NanoLabs (see In Sequence 4/8/2008), recently renamed Oxford Nanopore Technologies, to develop the technology commercially. In Sequence visited Bayley at his lab in Oxford last week to find out more about his research.
What are your general research interests?
Basically, I work on membrane proteins. We are an interdisciplinary research group, so I have chemists and biochemists and biologists in the group, and also people with more of a physical chemistry or physics background, and the occasional engineering student. We try and apply all these techniques to understand how membrane proteins work. I think what differentiates us from a lot of other research groups is [our focus on] applications of engineered membrane proteins. There is lots of work on engineered enzymes and engineered antibodies, but not as much work on [engineered] membrane proteins.
We have put a lot of effort into the α-hemolysin, which happens to be the same protein that some groups have been using for [nanopore-based] DNA sequencing. But [we have also been working on] other proteins. There are two major classes of membrane proteins: β-barrels, of which α-hemolysin is one, and α-helix bundles, which potassium channels represent. We work on both these different classes.
Do you study their structure and their function?
Mainly their function. We look for small-ish proteins that are relatively stable but tractable experimental subjects, and then we look at their properties. But also, we do a lot of protein engineering on them, so we try to choose simple proteins that maybe have simple properties, and engineer more sophisticated properties into them. And at the same time, we look for applications of these proteins, both in basic science and in biotechnology.
Tell me about α-hemolysin. How did you get interested in this protein, and when did it first occur to you to use it for sequencing DNA?
Initially, we were interested in that just as a basic science problem, because this is a soluble protein that is secreted by a gram-positive bacterium, Staphylococcus aureus, and it assembles into target membranes. This was a very interesting question of how you could get a very highly water-soluble protein to assemble into a lipid bilayer. I had just one student working on that in the 1980s. And then, right around 1989/1990, we decided to engineer this protein to give us some useful properties. Basically, it's really a blank slate, it just makes a hole in lipid bilayers and those holes stay open. We wanted to use it to, say, permeabilize cells, or kill cancer cells, or maybe for some sort of narrow filtration. There were lots of applications that we envisaged. One of them, at the time, was sensing, and in particular, we had this idea of single-molecule sensing. There is obviously a lot of work on single-channel recording for membrane proteins, [so we wanted to] engineer them in some way, [so] the analyte, the molecule you wanted to look at in solution, would bind inside the protein, and would change the current flow. We basically started with some basic science, but pretty soon got into these applications and this so-called stochastic sensing became quite a big area for us. We were able to show that the protein can act as a single-molecule detector for metal ions, organic molecules, proteins, DNA, really just about anything.
As long as it's small enough to fit in the pore?
Actually, it doesn't have to be small enough to fit through the pore because we also have this trick of designing a pore that has a kind of fishing line inside, so when it binds something on the outside, this is also registered in the current flow. So they don't have to be small molecules, they can actually be quite large molecules as well. And, really at the same time, [Daniel] Branton and [David] Deamer dreamed up this idea of sequencing DNA. They made some steps forward using the un-engineered protein, just the wildtype protein, but it seemed obvious to me all the way along that you would have to use some sort of engineered protein, because you need to slow down the DNA and get the protein to recognize the DNA in some way. When the $1,000 Genome initiative came along, Reza Gadhiri at [the] Scripps [Research Institute] said, ‘We really ought to write [an application] for this.’
How do you know Reza Ghadiri?
I have known him for a long time. We had grants from the same funding agency, like the Office for Naval Research, so we were at a lot of meetings together. We didn't really start working on this until, really, a couple of years ago [when we won a $1,000 Genome grant].
So what have you done in the last couple of years? How far along have you developed this technology?
I think when I started, I was a bit skeptical, because people worked on it for 10 years, and really, had not made much progress in terms of sequencing. So we took a fresh look at the situation, and I think the one thing missing was the protein engineering, and that's what we are real experts at.
It also seemed to us, as well as pulling the strand of single-stranded DNA through [the pore], and trying to sequence the bases as they go through, [what we call strand sequencing], we could revive this idea of exonuclease sequencing that really came from [Richard] Keller [at Los Alamos National Laboratory]. Around 1980, he had the idea of taking a DNA strand and making a complementary strand with a polymerase in which all four bases were labeled with different fluorescent dyes. And then you would put this in some sort of flow tube, maybe on a bead with optical tweezers or something, and then add an enzyme, exonuclease, that would cleave off one base at a time. Their idea was that this base would flow downstream past a laser and you would use single-molecule fluorescence detection to see which base had been cleaved off. I think the two major problems with that were that it's very hard to make this strand of DNA completely substituted with four fluorescent bases, and also, a tiny amount of fluorescent background can mess you up. Also, in retrospect, it would be very hard to make that highly parallel as well.
[We had] the idea that maybe we could revive this exonuclease sequencing by using our single-molecule detection that we had already developed for other things. For example, we had been able to show previously that you can even distinguish between the two enantiomers of a molecule like ibuprofen using a pore with a so-called adaptor in it. So we put cyclodextrins inside the pore to act as adaptors.
How far have you developed strand sequencing?
Most of the published work, so far, is from Ghadiri's group. One of the real problems of the strand sequencing is that the DNA goes through the pore very quickly, too fast for the usual techniques of single-channel recording to register the bases. They take one to five microseconds to go through the pore, which is very fast. And also, the base recognition was not solved. That base recognition also has to be done in the context of the neighboring bases. So while a particular base, say A, is in the pore, there are going to be 15 other bases on each side also in the pore also changing the current, if you are not very careful. There have been, really, two major pushes there. One is to recognize the bases, and we have been able to show that we can recognize bases when the DNA has stopped moving, when it's still. When you hold it within the pore, you can recognize the bases. I think more work has to be done to make sure that the neighboring bases don't interfere with the recognition. So we are continuing work on the recognition, and so is Reza.
The second thing you need to do is to slow the DNA down, probably about 100- to 1,000-fold, to enable you to do this recognition while the DNA is on the move. There are various different ways of doing that. One is to engineer the protein to grab onto the DNA in some way and slow it down, but the other one that many people have suggested is to use some enzyme to pull or push the DNA through the pore. Ghadiri's group has done some nice work on that, showing that DNA polymerase, when it makes a second strand, you can use that to pull the DNA through the pore. Both Ghadiri's group and our group are actively working on all these issues involving strand sequencing.
What about exonuclease sequencing?
On the exonuclease sequencing side, to get back to that, we were able to show, in a paper published in 2006, that we can indeed recognize all four bases very, very cleanly. We just used the bases themselves; we have not actually gotten the sequencing to work. We were able to put a cyclodextrin molecule, a cyclic oligosaccharide, inside the pore, and the nucleoside monophosphates bind inside. When they bind, they change the current that flows through the pore. And we were able to show that you could get a different current amplitude for G, A, T, and C. That, I think, was a really significant paper. In other words, we are able to identify the bases without any fluorescence, so it's the native, natural bases that we can identify.
Right now, what we would like to do — and Oxford Nanopore Technologies are working on this now actively — is to attach an [exonuclease] enzyme on top. DNA would come along, and what you would like to do is have it snip off bases one at a time, and then you need to drive them into the pore. You need to make sure 100 percent go in, and then you need to make sure they come out the other side. I think we have solved the recognition, and we have shown that they come out the other side. What we really need to show now is that 100 percent of these are actually pulled into the pore.
How do you split up the work between your academic group and the company?
In the lab, we have a $1,000 Genome grant, and we also have four postdocs funded by Oxford Nanopore Technologies. But [these postdocs] are working on basic science projects. And then we have some students who are on training grants. Altogether, it's probably about eight people in the lab working on this. And they work, really, on research problems that need to be solved, or new ideas involving DNA sequencing.
Oxford Nanopore Technologies are taking things that we have demonstrated and then trying to reduce them to practical use. In the case of the DNA sequencing, they have been refining the base identification, for example, and they are also making chips. Obviously, we have to make this highly parallel, so they have been making chips where you can have 100 pores operating at once, and eventually, we envisage many more than that. So basically, we are the research side, and they are the development side, although there is not an absolute strict partition between the two.
What do you think the first commercial nanopore sequencer will be capable of doing?
I think it would have to be somewhat parallel, so you would have maybe 100 pores. It would depend on what application you wanted. I think if you wanted to do microbial genomes, then you could have one or very few pores operating, but if you want to do human genome sequencing, then you would have to have maybe 100 or more pores working in parallel.
I think if it all pans out, what would differentiate us from the current sequencing technology, what we would hope for, is longer reads. And potentially, these reads could be very long, 10,000 bases. The new technologies, with the exception of the 454, all are short-read technologies, 25- to 35-base reads that you have to try to fit all together. So [they are] very good for resequencing, but [they] still have [their] problems [with] assembling all this. We would hope to do much longer [reads], and we would be competitive, initially, in terms of the speed and cost of sequencing. I think in the longer term, when we can make bigger chips with these pores on them, it would be very much cheaper than present technologies, because we don't have to use optical detection, which is turning out to be cumbersome and expensive.
What would you expect the error rate to be like?
I think the error rate in all the current systems is higher than [in] Sanger sequencing, which has an amazingly low error rate. It's a little hard to say where [ours] will stand compared to the new technologies. We would certainly hope to get up into that area. I think based on the base identification that we have done, we should certainly be better than 98 percent. The question is whether we will miss any bases, or, for some reason, miss a whole little sequence of bases. I think when you are resequencing, this is somewhat less of a bother, because you might have a run of, say, 300 bases, then a little gap for some reason, then another long run. So in some ways, it's a bit like paired-end sequencing. You might have runs that are actually connected on the same piece of DNA. So I think the error rate will be competitive, and, with time, improve, but I don't think it would ever get up to the level of Sanger sequencing.
Other groups are working on solid-state nanopores rather than protein nanopores. What are the pros and cons of both approaches, and why did you choose to bet on protein nanopores?
I think a lot of it is history, because I am a chemist, and a protein chemist. But I think in favor of the protein pores, I think we are fortunate, in a way, that the α-hemolysin is just the right size. Otherwise, you would have to engineer a pore of the right size. And then, these can be made completely reproducibly. We can just make billions of pores that are identical, which is very hard to do, if not impossible, with the solid-state systems.
And also, you can engineer these at the sub-nanometer, or Angstrom, level by genetic engineering or site-directed protein modification, chemical modification. These [protein] pores are [also] very stable. I think that's something that people working on the solid-state pores almost always start a lot of their papers with, saying that their pores are tremendously stable. But the α-hemolysin pore will work just as you come to the boiling point of water, 100 °C. It's a very, very stable protein. And in fact, in my other life of basic science, one of these hypotheses I have is that membrane proteins are, if not thermodynamically, tremendously kinetically stable to denaturation.
So the pores are very stable, you can produce uniform pores, which is very important for a single-molecule technique, and they can be engineered with exquisite precision, so I think they have a lot going for them.
I think one potential advantage of the solid-state pores is that you can array them very easily, although no one has actually demonstrated that, and it might be more difficult to make arrays of protein pores. But we think we have developed ways to do that, too. We have developed methods for putting single copies of pores into bilayers. So I don't think the manufacturing advantage for solid state pores is there, either.
And then, I think in terms of detection, we haven't sequenced DNA yet, but you can actually see how you will. In the case of the exonuclease, we can detect the four different bases, and I think we can see that at least with the immobilized DNA strands. With the solid based pores, their methods of base detection, at this point, are pretty much entirely theoretical. They say they have these transverse electrodes, and they will measure a tunneling current or a change in capacitance. To my knowledge, although there are many theoretical and computational papers, no one has actually experimentally demonstrated that they can do that. I hope I am correct in saying that, but I don't actually see that anyone has done that in a convincing way. I suppose that if what they think is true, it may be that you don't have to make such uniform pores when you have their somewhat mysterious detection method. So they may be able to get around that. Maybe they will be able to sequence much faster than we can. We are projecting one millisecond per base at best. Maybe they can sequence even faster; it remains to be seen.
I think right now, the protein pores are very obviously at the forefront, but it may be that something comes along for the solid-state systems that kind of blows us out of the water, but not really at this point.
What is your best guess for when the first commercial nanopore sequencer will be available?
That is very hard to say. I think once we demonstrate sequencing, even only five or 10 nucleotides, then it will just come very quickly after that. And you can see with these other companies that one day they almost had nothing, and then, very quickly, once you got a handle on these things, you can get to the market pretty quickly. If you go back, look at the first pyrosequencing papers from 454, or at Solexa. These guys just had a few of their spots on their slides, or a few of their wells, behaving, just doing very few bases. But once they got to that point, they realized they could put a huge amount of energy behind it and get something to market pretty quickly.
Do you have a rough timeframe for when you hope to demonstrate sequencing?
I don't know. I think it's really something that could come very quickly. It could just suddenly come in the next month, or it could take several more years. It's a bit like, I don't think you would ask a mathematician when he would solve a particular problem. It could be that they are having a shower in the morning and they got it, or it could take several more years. I think it's that kind of thing, but more at an experimental level. But I think everything is in place. On the protein side, everything is in place to have this work. Exactly how long it will take, I can't say. But there is nothing to suggest that it can't be done.