University of Washington, Department of Genome Sciences
Name: Jay Shendure
Title: Assistant professor, University of Washington, Department of Genome Sciences (starting this fall)
Education: MD, Harvard Medical School, 2007
PhD in genetics, Harvard University, 2005 (worked with George Church)
AB in molecular biology, Princeton University, 1996
As an MD-PhD student in George Church’s lab, Jay Shendure helped develop polony sequencing, a highly parallel sequencing-by-ligation method that formed the basis for Applied Biosystems’ SOLiD sequencer.
Lately, Shendure has been focusing on alternatives to PCR for selectively amplifying parts of the genome for sequencing. In Sequence spoke to him recently to find out what led to his landmark 2005 Science publication describing polony sequencing, how his current work is going, and what his plans are for this fall, when he starts his own research group at the University of Washington.
Where did the ideas for polony sequencing come from, and how did you develop them?
It has kind of a long, convoluted history. Polony sequencing is a term that can be used for our sequencing technology as well as for a number of other technologies. Another term for it is ‘cyclic array sequencing,’ with this basic concept of arrays of features being sequenced in parallel, using a single reagent volume to manipulate all the features in parallel. That idea has been around for quite a long time. But what wasn’t there was a practical implementation of it.
The original polony work in the Church lab was really done by Rob Mitra, who is now a professor at Washington University in St. Louis. The focus at that time was a little different in the sense that the system that was being used was in situ polonies. The basic idea there was performing PCR in the context of an acrylamide gel, where one of the primers is immobilized in the gel, so you end up with these spheres that originate clonally from individual single molecules.
When I joined the lab, I wandered around from project to project for a few years, and ultimately ended up working with Rob on some of this stuff. He was a postdoc, and I was a graduate student. We worked out ways to sequence in situ polonies using single-base extension. That resulted in a paper in Analytical Biochemistry in 2003. Around the time that we were publishing that, this was sequencing 20 templates, and 5- to 6-base-pair reads. It was still a long way from what we really wanted to do. Rob ended up leaving around that time and I took over the Church Lab’s end of the project.
Around the same time I had the good fortune of meeting Greg Porreca, who is shortly defending his thesis [here at Harvard]. He was a rotation student with me, just a phenomenally talented guy, so I ended up convincing him to never even do another rotation, and the two of us worked together for three years, taking the technology from where it was then to where we finally published it in 2005.
The term ‘polony sequencing’ remained the same, but we basically changed almost everything that was there in the course of trying to get it to work. One of the key innovations was developed by Bert Vogelstein’s lab at Johns Hopkins University, which was the emulsion PCR technology, which is also used by 454 as well as by Applied Biosystems’ SOLiD system. The second key innovation was switching to this sequencing-by-ligation scheme. Also, for many years we were using microarray scanners to image, which is a remarkably slow way of collecting images. So getting the signal density to a point where we could sequence using a CCD camera-based instrument was another important improvement. The actual integration of this, I think, was the biggest challenge. We developed a biochemistry for sequencing, we had to build a sequencing instrument, we had to write the software to read the images and turn it into sequence, we had to come up with new ways of constructing libraries that didn’t rely on E. coli but were purely in vitro, and mate-pair libraries. The real hard part about it was integrating across all these different disciplines. But in the end, it worked out.
What aspects of polony sequencing are now being used in the commercial next-generation sequencing platforms?
The emulsion PCR is now used by 454 as well as by the ABI SOLiD. The mate-pair in vitro tag libraries are used at least by ABI. Sequencing-by-ligation, the way that we did it, was quite different from the way that ABI is doing it now. They developed a method for serial ligation events to give you longer read lengths, which I think is great.
What have you and your colleagues been working on since the 2005 Science paper?
The research has been going forward on a couple of fronts. There is one focus on developing a second generation of the instrument that can be manufactured and is cheaper. Part of the problem with building one of these is, the way we built it, there is quite a lot of very specific knowledge and some tricks involved to get it together. We have a great engineer working on it, Rich Terry, who has come up with a plan. He is in the process of building a second-generation instrument that will be remarkably easier to put together and will cost less than $100,000 to build.
What I and Greg Porreca have been more focused on is figuring out ways to capture specific subsets of the genome. In my mind, the commercial technologies are now moving very quickly, they have large teams of people working on them. There are certain applications where it’s a piece of cake, like ChIP-Seq, where you are just sequencing a short tag. It fits with it very well, because it gives you short read lengths, and it’s very easy to go from the protocol to having a library.
But there are other things that people want to do, for example in the Cancer Genome Atlas proposals, as well as in the ClinSeq study from NHGRI. The idea is medical resequencing or cancer resequencing, where what people are interested in are not the common SNPs, and also not the whole genome, but a small subset of the genome. We just lack good technologies for getting a subset, anywhere beyond uniplexed PCR. It’s not just the exons — I think that’s one important subset — but it’s sort of generic. There are lots of questions in medical sequencing, as well as in cancer genomics, that you would want to sequence the whole exonome for, if you could, and do it cheaply. But then other sorts of projects, like whole-genome association studies, are identifying regions, and the next step for a lot of those studies is to go on and resequence those regions and overlap them in lots of people. But there is no effective way to isolate one megabase or a 3-megabase region of the genome that’s scaleable in the same way that the new sequencing technologies allow you to do. Shifting our focus away from the sequencing itself and to some of this front-end stuff is going to be very important. And I think people are starting to wake up to the fact that we don’t have that.
Tell me about the amplification method you have been developing. How does it differ from other approaches, for example the recent publication in PNAS by researchers at Stanford University (see In Sequence 5/22/2007)?
We are essentially using a method that is a derivative of molecular inversion probe methods used for multiplex genotyping. I think one key distinguishing feature is that we are primarily using oligonucleotides derived from microarrays. If you want to order these one by one, and you want to do 250,000 targets, or 200,000 targets, that’s quite an expensive upfront cost, which makes it pretty unrealistic to try in the first place. But if you can get oligos derived from custom-synthesized, programmable micorarrays, and release them to generate your probe pools, that’s a much more effective way of doing things. You need a chemistry so you can release the oligos from the chip. And the actual sequence needs to be very accurate. Some of the methods for synthesis give you arrays that are useful for hybridization experiments, but if you were to release the sequences from the chip and sequence them, they are actually quite terrible from an accuracy point of view. We are in the process of exploring a number of vendors and evaluating them.
How many regions can you amplify in parallel now?
We are at about 12,000 exons amplified in a single reaction right now, and we are planning to try 55,000 soon. The 12,000 works, and we think there is no reason the 55,000 won’t. Actually, I think the bigger challenge does not necessarily have to do with the complexity of the reaction. It’s possible we will push this to 100,000, but the bigger challenge is going to be uniformity. The problem that we are seeing, and I think you see it when you read the PNAS paper, too, is it definitely works, but it could work a lot better if the relative amplification of individual targets was uniform. Non-uniform amplification poses a big cost when you actually go to sequence. So I think solving that problem is another key challenge, probably the biggest obstacle to getting this to work efficiently.
How would you solve this?
By grouping primers that work with similar efficiencies, by adjusting the concentrations of the primers within the mix, you can correct. The solution is probably going to be empirical rather than from any magic change of the protocol.
So it would be different for every project?
No. In my mind, there is sort of a golden solution here. If you can come up with a set that captures the exonome, you only need to do that once. In my mind, that’s a better route than everyone coming up with their own sets for their own particular pet project. You come up with one pool that captures the exons, it costs $1,000 or a couple of thousand dollars to generate a sequence of that subset, and doing that upfront work of getting it normalized is something you only do once, and then you have got it.
The regional amplification is a different challenge because everyone is interested in a different megabase of the genome for whatever reason. At some point, it might make sense to have a set of regional targeting reagents. So for example, if there is some common resource, you can just order from this resource this reagent, and it is designed to give you even amplification of a particular one-megabase region of the genome. And if you want five megabases, you order those five megabases. You can also imagine the upfront work being done there once. The alternative to that is coming up with a way of doing it in a normalized fashion on the first shot. Either of those things could happen, depending on how things go.
You will soon move to the University of Washington and start your own lab. What are you planning to work on there?
The exon sequencing I will continue to work on in collaboration with George [Church]. I think I’ll continue to work on technology, but I’m not going to be necessarily developing sequencing instruments. In terms of the exon sequencing, I’m in the process of developing some collaborations. I have not been spending my time building cohorts of interesting patients, so the best way I can apply this it to collaborate or partner with people that already have these great cohorts sitting in their fridge and start doing some sequencing.
How do you view the future of the next polony sequencer?
The hope is for it to cost less than $100,000, so it would be cheaper than any of the commercial platforms. But it’s a balancing act. There is absolutely no question that there is a value added by commercialization, there inherently is. It may take a special kind of lab to want to actually build their own sequencer. That may not be for everyone. But I think one of the great things about this next generation of sequencers is, unlike the last generation, I think we actually have choices. That’s a lot of more fun than not having choices.