Name: Brewster Kingham
Position: Director, DNA Sequencing & Genotyping Center, Delaware Biotechnology Institute, University of Delaware, since 2005
Experience and Education:
Associate scientist, University of Delaware, since 1998 MS in virology, University of Delaware, 1998 BS in animal science, University of Delaware, 1994
The University of Delaware DNA Sequencing & Genotyping Center provides Sanger sequencing, next-generation sequencing, and several other genomic services to investigators at the university and outside research groups.
For sequencing, the center is equipped with an Illumina HiSeq 2000, an ABI 3130 XL Genetic Analyzer, and a Pacific Biosciences PacBio RS single-molecule sequencer, which it installed last September.
Last week, In Sequence caught up with the center's director, Bruce Kingham, and asked him about his experience with the PacBio platform so far. Below is an edited version of the conversation.
Why did you decide to bring in the PacBio platform, and where did the funding for this instrument come from?
We're fortunate at UD that we have a very prestigious group of genomics researchers here. They are very well-funded through various federal sources — NIH [National Institutes of Health], NSF [National Science Foundation], and USDA [US Department of Agriculture], among others.
We put together an NSF major research instrumentation grant. We had put in similar grants for several years for the 454 [sequencer] that were not approved. Each time, [the application was] well-received and the reviewers provided some very helpful advice on how to improve this for the next year. When it came time to submit again, we decided to go for a Pacific Biosciences machine because we felt at the time that this technology was going to be the third generation of sequencing. We felt that eventually, it would be a successful technology and it would be a perfect complement to the other chemistries and technologies that we have here.
That MRI grant was submitted in April or May of 2010. We received approval for it in the summer of 2010, and our instrument was installed in September of 2011. We were, I believe, the 25th site in the world to get one.
There were a lot of issues, a lot of problems with it. The technology was not well-refined at that point. Late last fall, the company was also going through a difficult time financially, and it put a lot of pressure on them to change their business. What I would say about the company, though, is that they were very committed to getting these machines up and running. From the time that we had the machine installed in early September, between engineers and application scientists, we probably had a full-time Pacific Biosciences employee here for six straight months.
It was a challenge. We really could not get data off of that machine for about six months. But we went into this project with low expectations. Most people understand that when you're buying a brand new technology that really has not been introduced before, you kind of have to go into it with low expectations, and fortunately, we did.
Why do you think PacBio released its instrument seemingly prematurely?
If you want my opinion, I think their intention was that the early access program was going to give them time to identify issues and problems, and to tweak the instrument prior to commercialization. I think what actually happened is, it kind of opened up the challenges that are associated with true single-molecule sequencing. I think it's a much more challenging aspect of genomics than a lot of people realize.
Helicos was the first single-molecule sequencer, and there were known issues associated with that machine. I think for the most part, a lot of people in my type of position did not think that the Helicos machine was going to be a success other than being a stepping stone to the machine that was going to truly deliver single-molecule sequencing. In that respect, Helicos was a very important machine. It gave Pacific Biosciences the opportunity and the time to refine the procedures that are necessary for single-molecule sequencing.
What has changed since you started using the C2 chemistry?
The difference between C1 chemistry and C2 chemistry is like night and day. Right from the start, we were getting very good data from the C2 chemistry. Again, a credit to the company; they recognized the problems that were associated with the machine, and once they did recognize them, it seems that they addressed them very quickly. When they had started to see some of the preliminary data [with the C2 chemistry] in house, they realized that it was important to roll out that upgrade much sooner. Not only for the sake of those of us in the field that are running the machines, but for their own sake as well.
What kind of performance do you now get?
Our average read length is around 4,000 base pairs. The yield is dependent on the type of analysis that's done at the end of the run, but when we use the most liberal analysis, we are seeing 300 megabases per SMRT cell.
How have you been using the machine so far?
We have run just about everything on there — we've run viral samples, bacterial samples, and eukaryotic samples. We have run genomes that were amplified from single cells, and we have run marine metagenomics samples. These were actually PCR amplicons where we amplified the 16S ribosomal region, generated libraries from it, and ran them on the machine. In the future, we hope to not have the need to amplify.
The amount of input material required has been a large issue, which the company seems to be addressing fairly considerably now. About three to four months ago, we were asking for about 10 micrograms of input DNA. That's an obstacle to a lot of investigators, to be able to get that much DNA. And not only that, the quality of the DNA has to be very high as well. We're now getting ready to test some samples with 500 nanograms, which is considerably less.
The amount of input material required has changed, but not necessarily the quality. Since this is single-molecule sequencing, the quality of the input material is critical to the success of the instrument. With all those things considered, that makes this instrument a very challenging platform to introduce into a core setting.
How has the platform been received by the users of your facility?
Since the C2 chemistry was introduced, it's been received very, very well. We're still kind of working through how to handle this data. PacBio is putting a lot of effort into refining their software, as well as third parties, and we are working with some open source software programs to do that as well.
It's understandable that we're in that position because you can't put the cart ahead of the horse. You don't know how to analyze the data until you have the data to analyze. Basically, that's where we are now. We can run the instrument, and we can see the average read length, how much data we have, so we get a preliminary assessment of the data, but the downstream analysis of it is very time-consuming, and it's still ongoing.
What kinds of projects do you think the PacBio will be most useful for?
It would be a very effective tool for the de novo assembly of genomes that do not have a reference. There is a strategy being used right now where you generate Pacific Biosciences data and Illumina data, and then you use the more accurate Illumina data to error-correct the PacBio data. And then you use that error-corrected data to build the scaffold onto which you layer your Illumina data. That has the effect of generating much longer contigs for de novo assembly.
Obviously, we have a great interest here in metagenomic analysis. The holy grail to this would be to take an environmental sample and, without doing any type of amplification, sequence individual DNA molecules to be able to determine the species that are in it. It all comes down to the amount of input material. We are working on reducing that. We're not at that point yet where we can take a metagenomic samples that hasn't been amplified, but we're working towards that.
How do you deal with the high single read error rate of the PacBio?
When designing a project for this platform, it's necessary to go into it with an understanding of that high error rate and how you're going to deal with that. With 16S ribosomal sequencing, for instance, the way to deal with it is by circular consensus sequencing. We can take advantage of the 4,000 base pair reads. If we have a 600 to 700 base pair gene that we are amplifying, we can generate from four to six reads from that single molecule.
Each time you generate an additional read of the molecule, you increase the accuracy of the data that's coming off the machine. If we can do two passes of the same molecule, the read accuracy is generally at or above 97 percent; if we can do three passes, the accuracy is at or above 98 percent; if we can do five or more passes, the accuracy is at or above 99 percent. With single-molecule sequencing, if the high error rate is going to be a large issue for downstream analysis, then the way to deal with that is by circular consensus sequencing. That obviously limits the length of the read because the fragment you are sequencing is going to be considerably smaller, but you get much higher accuracy.
How does PacBio data compare to other types of sequencing data in terms of cost?
I don't think it would be really fair to compare it to other types of sequencing data, simply because it's a very different type of data. You can make the comparison between Sanger, Illumina, and PacBio as far as the cost per megabase, but it's really not a fair comparison. The advantage of Sanger is the quality of the sequence, but on a per-megabase basis, it's very expensive. Illumina is very inexpensive on a per-megabase basis, but your read length is much lower, and your error rate is higher than for Sanger. With the PacBio, you have the advantage of single-molecule sequencing, and also, you have the advantage of generating reads that are 4,000 base pairs long.
To provide some reference, we charge approximately $450 to generate a PacBio library, and then sequencing charges are about $250 per SMRT cell. Let's say we generate 250 megabases per SMRT cell at $250, that's about a dollar per megabase. Obviously, those numbers are going to be changing in the future.
The company has some fairly exciting updates planned in the next year, and considering the experience that we have had with them over the past four to five months, I'm inclined to believe that they will come through on what they hope to do over the next year, to improve the yield, and to directly detect methylation, something that many people have considered a huge advantage of this machine. We have several investigators at the University of Delaware that are just sitting back and waiting for that feature to be introduced.
Have you automated library preparation for the PacBio?
No, we haven't, simply because I would still say that we're more or less in the validation stages with this machine. Once it gets to the point where automation is going to be a necessity, we will have to consider that.
What's top of your wish list for improvements to the PacBio?
Obviously, the error rate, and improving the number of reads per run. Right now, a maximum of 30 to 40 percent of the available ZMWs are occupied by a single polymerase providing data, and we'd like to see that improved. We'd also like to see read length longer.
But it doesn't matter what platform you're looking at — I could say the same thing for Sanger, the same thing for Illumina; I'd like to see more yield, longer reads, better accuracy.
More specifically to the PacBio, I would like to see the requirements for input DNA to go down considerably. That could really open up this machine to much more effective metagenomic analyses. Also, we are really anticipating the methylation detection, and how effective it is going to be.
What other sequencing platforms are you keeping an eye on?
Like most people, we are paying attention to Oxford [Nanopore Technologies]. They said they will be pricing their USB-based MinIon at about $900. At that price, there is no reason for us not to try it. We really haven't heard too much about ancillary charges that may be associated with it, and we're waiting to hear more about that. But if they offer a single-molecule sequencer at $900, that's something that we have to take a serious look at.
I was at AGBT, and I heard their announcement, and it reminded me very much of PacBio's presentation at AGBT three to four years earlier. And it's just now that PacBio has become, from my perspective, a legitimate instrument. But it's much easier to test a $900 instrument than it is to test a $750,000 instrument.
Are there any other sequencing platforms you're interested in?
Not right now. At UD, there has really not been a lot of interest in Ion Torrent. I'm not quite sure what the reason for that is; it may be for no other reason than that we have our hands full here with the PacBio.
Is there anything else you'd like to add?
Over the past four months, we have been very impressed with what our PacBio RS has been delivering. The company has been very attentive, so we're encouraged that they are going to continue to deliver some advancements that are going to make this platform much easier for investigators to incorporate into genomics projects.