Name: Harold Swerdlow
Position: Head of sequencing technology, Wellcome Trust Sanger Institute, since 2008
Experience and Education:
— Chief technology officer, the Dolomite Centre, 2006-2007
— Senior director of research (and other positions), Solexa, 2000-2006
— Unit coordinator and director of microarray core facility, Center for Genomics Research, Karolinska Institute, 1998-2000
— Research assistant professor, bioengineering and human genetics, University of Utah, 1993-1998
— Postdoc in human genetics, University of Utah, 1991-1993
— PhD in bioengineering, University of Utah, 1991
— BS equivalent in mechanical engineering, University of New Mexico, 1987
— BS in physics and mathematics, University of California, Santa Cruz, 1979
As head of sequencing technology at the Wellcome Trust Sanger Institute, Harold Swerdlow evaluates and manages all new sequencing technologies that enter the institute until they go into full production mode.
Formerly called the Sanger Center, the institute was founded in 1992 by the Wellcome Trust and the UK Medical Research Council and contributed about a third of the sequence to the Human Genome Project. It currently houses 27 Illumina Genome Analyzers, three ABI SOLiD sequencers, and two 454 Genome Sequencers, besides 50 ABI 3730 capillary electrophoresis instruments.
In Sequence recently spoke with Swerdlow, who spent about six years at Solexa, most recently as senior director of research, and started his new position at the Sanger Institute a few weeks ago.
What are your tasks in your new role as head of sequencing technology at the Sanger Institute? I understand this is a new position at the institute.
My tasks are generally managing all aspects of next-generation sequencing development. That includes research on novel methods and instrumentation for all the high-throughput DNA analysis techniques. [It also involves the] development of improved techniques for sample preparation, and improving performance of the current next-generation instruments, [as well as] assessing new sequencing technologies as they become available. Lastly, [it involves the] interim management of next-generation production, until the sequencing pipeline is robust and reliable. At that point, the production itself will be handed over to Carol Churcher, who is head of sequencing operations. The three of us — Tony Cox, who is head of sequencing informatics, and Carol Churcher and I — will be running all the different aspects of the sequencing. Julian Parkhill is director of sequencing and oversees all those functions.
Tell me about your experience as senior director of research at Solexa. How long were you involved with Solexa?
When the company moved from [being] a virtual company within the University of Cambridge to its real location, I was the third employee on the new site. I was there very early in development and took [the technology] all the way through to, essentially, the prototype stage, after the merger with Lynx. I was involved with all the different aspects of sequencing technology, including some informatics and chemistry, which was also done by other people, but [also] all the biochemistry, instrumentation, surface chemistry. I know a lot about how the instrument works on the inside. Not as much about how it works for the outside user, which I’m starting to learn here.
I understand parts of Solexa’s technology came from different places.
The seed idea came from the Cambridge University chemistry department, [in particular, Shankar Balasubramanian and David Klenerman]. And we acquired a company in Switzerland called Manteia that enabled the cluster technology. But all the rest of the technology came from inside Solexa.
What did you contribute to the field of capillary-based sequencing?
I was the first person to successfully run a DNA sequencing reaction in a capillary [Swerdlow, H. and Gesteland, R. (1990). Nucleic Acids Res. 18: 1415‑1419.]. [At the time, I was a graduate student] at the University of Utah. We were involved with a lot of the early developments of technology, instruments, polymers, things like that. We published some of the basic papers on eliminating bubble problems in capillaries, different technologies for detecting sequencing reactions in capillaries, optical methods for improving signal from a sequencing reaction, et cetera.
What kinds of sequencing platforms do you have at the Sanger Institute?
At the moment, we have 27 Illumina machines, two 454s [one GS-FLX and one GS-20], and three ABI SOLiDs.
That’s a lot of Illumina machines …
I guess we made a strong commitment to the Illumina platform. But we are also committed to staying abreast of new developments and actively pursuing relationships with new sequencing companies. We are not sitting still with the instruments we have. And we also still have 50 ABI 3730s, which are being used heavily at the moment to finish zebrafish and pig projects. So we have not totally abandoned the capillary methods.
Are you also staying abreast of improvements in Sanger sequencing?
If you are talking about something like Genome Corp., since that’s sort of next-generation, it probably would concern me. But not just protocol improvements and things like that. Major improvements, probably yes.
What kind of data storage and analysis hardware do you have to support the sequencers?
Our computer infrastructure has just been upgraded — it took about the last six months to do that — specifically to support the next-generation production. They built a high-speed compute farm with 640 cores, and there is a 320-terabyte file server for short-term storage of images and sequences from the machines. It’s all supported by a 1-gigabit backbone, which is quite powerful. The whole setup can support about 30 Illumina sequencers. Beyond that, we will probably need to beef it up a bit.
Is the plan to store all data coming from the machines indefinitely?
[What we currently have] is enough to handle the output of the machines so that people can analyze the data and look at it, [and] you don’t have to delete a run every time you start a new one. There is plenty of capacity for that kind of thing. But what’s going to happen after a month, I’m not exactly sure. There is definitely something in place.
Are you planning to add new sequencers in the near future?
Yes, but we don’t know when and where. There is not a specific budget for buying more sequencers, but we will definitely be looking at other platforms and buying other machines in the future. We always have, and we always will.
What are your criteria for choosing new platforms?
We are definitely committed to staying at the forefront, and whatever that takes, we will do it. We are constantly beta-testing new instruments and new versions of existing instruments. We would like to see that the technology has produced some reasonable amount of accurate sequence before making any kind of financial commitment, I think that’s safe to say.
But also, the way our board of management works, they are very progressive, and we have a role in developing resources for the wider community. So we are interested in examining new technologies, and we can validate emerging technologies, possibly when they are not mature enough for other labs. Because we want to stay on the forefront of sequencing, we might jump a little bit sooner than a small lab that only has enough money to buy one instrument. Also, we have a mandate to share our experiences with other people, to give out information that, again, would not be part of a small lab’s interest to show other people how to do it.
My group here, [which] I am building up now, is going to consist of scientists and perhaps an engineer who have experience in technology development. That means that the Sanger [Institute] can get involved in some more collaborative projects, some more early-stage, earlier access [projects] than was possible in recent history because previously, they were really just end users. My new role here allows us to do a bit more speculative [things], become involved in a bit more earlier with certain technologies, to take some chances on things that they would not have done earlier on. Previously, they would have only looked at mature technologies to bring in. [We are] looking at collaborations with some very early stage companies, and we can help them get their technologies going.
Can you give a few examples of projects at the Sanger that involved the different next-generation sequencing platforms?
Certainly we are excited to be part of the recently announced 1,000 Genomes Project (see In Sequence 1/22/2008), which was driven by Richard Durbin here at the Wellcome Trust Sanger Institute. The interesting things are that it’s going to look for disease-causing and other variations in multiple human populations, but also, it’s going to identify structural variants, like insertions, deletions and duplications, which have not been previously studied in great detail. It’s quite an audacious call at present, but one with a very large payoff. And we are going to be doing that on the Iluminas, at least for the present, with Richard Durbin involved with coordinating data analysis. It’s quite a large project which focuses our interest.
We are also using the Illuminas for Mike Stratton’s Cancer Genome Project, and for a gorilla-sequencing project. And Julian Parkhill will also be using Solexa-based sequencing for high-throughput pathogen projects. For example, looking at highly variable bacterial populations where there are so many mutations in any one population that it’s very hard to get any answer about what’s really causative, but if you look at large numbers of isolates, you can get a lot of information that you would not be able to obtain with any other method. These are not the only projects we are doing, but these are the high-profile ones.
We are using the 454 instruments, [which] are performing reliably, for pathogen sequencing predominantly. In fact, there has been a publication recently on Chlamydia trachomatis using 454 sequencing.
And we are just getting started with the ABI SOLiDs, [which are] fairly new here. We hope to use them to support some other projects, like the Cancer Genome Project.
What are the greatest challenges in using these new platforms, both technically and in dealing with the data they produce?
I think it’s safe to say that compared to what people are used to from the ABI capillary machines, these instruments don’t just work out of the box. You can’t just plug them into the wall and start analyzing the data on your portable desktop PC the way you used to do. It takes a lot of development and support to feed these machines and to look at what comes out of them. That’s the technical challenge. And then they are constantly evolving, and the companies are learning to support them, how to manufacture reagents and things like that. There [are] a lot of teething pains.
In terms of the data side, a lot of people have talked recently about storage and compute requirements for these instruments. But that’s exacerbated by the fact that everybody wants to store the images for the moment, in case there is a new pipeline that comes out to analyze the data better in the future. But in my view, that’s a fairly manageable problem because you can just throw more money at it, buy more hardware, and you can solve that kind of problem. Of course that’s difficult for the small labs compared to the genome centers.
The biggest challenge is in knowing how to extract and analyze the data in an optimal way, because it’s data that we are not used to looking at. There are things like color discrimination, base calling, calibration, normalization of the data. A lot of this stuff is not worked out exactly, how it needs to be done. But the neat thing about it, if you can solve some of these things, we can get better analysis techniques, you are going to get more base calls from the same data with much higher quality.
The other thing that’s hard at the moment is that the instrument suppliers are producing quality metrics that aren’t really the same as what people are used to. So in terms of the assemblers and aligners, people here are working hard on trying to use what they know about aligning sequences with quality metrics, as well as trying to figure out what those quality metrics mean for the new data. So it’s partly adapting, and partly changing the tools, so that you can get better performance for consensus and SNP calling. People don’t really know what cutoffs thresholds to use. If you are talking about good and bad data, exactly what good means and bad means is not clear. Where should we set our cutoff thresholds so that we get the best data but don’t throw away too much? That’s very tricky at the moment.
Do the manufacturers provide the same level of data analysis support they provide for Sanger sequencing?
Probably not. Some of the issues I just mentioned, I don’t think they understand them well enough themselves to help you too much, which is why it’s going to have to be [institutes] like Sanger and Broad and Wash U that are on the forefront of this, because the biggest [centers] are generating lots of data, and therefore they are the ones who are going to figure out how to look at that data. But we do work closely with the manufacturers.
Sanger is fairly comfortable with [doing such development]. We work with other similar institutions, we work with manufacturers to improve the pipeline software, to improve algorithms for data analysis. Also, something the manufacturers just tend not to do is automating the entire workflow beyond just the instrument itself. Things like sample barcoding, automating sample sheets, QC of runs, meta-analysis of data, looking at patterns in the data, that’s not easy. If we identify an issue that we are having with some of the runs, trying to figure out how that correlates with batch numbers of reagents and things like that. Those tools are definitely not provided and we need to do that all ourselves pretty much, what I call meta-analysis of the data.
What do you see as the greatest need for development this year?
To be honest, I would say getting into production. That’s my number one priority, getting into full production with all the machines running reliably all the time. That’s the next challenges. You definitely get good runs, but the full production is the challenge.
And in terms of technology development? Last year, people said developing capture methods was really important …
Yes, and we are also working on that. There are various things like that, [such as] the automated sample introduction in general. Right now, it’s very much ones and twos in terms of samples, so getting the whole sample production pipeline working smoothly, I think, is important. The manufacturers will continue to make the instruments more reliable and more productive. We have to make sure that the samples going into them are of high quality. Generally working on sample prep methods, [including] pulling down specific sequences.
Also on automating sample prep?
Reliability [comes] first and automation second.
In a survey we ran last year, many users brought up the issue of sample prep, that it takes too much time, is too much hands on.
Certainly for all the platforms there are sample-preparation issues that haven’t been dealt with before. For the 3730s, they are all worked out, you know exactly how to do it, and the robots know how to make sample preps for capillaries. There [are] a lot less samples for the new next-generation instruments, but we still have to face that and solve it.
Where do you see sequencing technologies going long-term? How do you think they will be used five years from now?
[With] all the sequencing-by-synthesis technologies, including [sequencing by] ligation, the fierce competition is going to mean that the hardware will very soon be reaching its practical limits, I think. Those limits are set by how fast you can optically scan a surface. If you need to go much faster with a reasonably priced instrument, you are going to need improvements in a CCD camera. But in fact, that technology is quite mature now. So it’s not obvious that you are going to have a factor of 10 improvement in CCD technology in next five years.
My personal feeling is that the next leap in throughput and cost, it’s going to come out of something more like a scanning technology, like the nanopore methods. But I also believe — whether it’s true or not remains to be seen — that those methods are going to be inherently inaccurate. Because they are scanning DNA very, very quickly, they are not going to be able to look at every base extremely accurately. But since a lot of genomes are going to be finished soon, things like resequencing and digital expression analysis will be the killer applications.
A nanopore method, if you could use it at very high coverage to work around the inaccuracies, might be good enough to look for consensus mutations, and accurate enough just to quantitate transcripts for digital expression analysis. I think there’s going to be an opening there for these methods. They have the potential to be very cheap and very fast, because they are generally non-label, and they are limited by more physical limits rather than how fast can you scan across a surface.
In five years, if you can significantly undercut the $1,000 genome, that’s going to be ushering in an era of sequencing-based genetics for the average person, the average disease researcher in a lab. If you could do [sequencing for] $100 [per] sample, then you could do a 1,000 cases vs. 1,000 controls study for only $200,000, which is in reach of the average small lab. You are doing genetics, basically, by sequencing entire genomes, which is quite neat.
It also, then, will open up all this individualized medicine that people have been talking about for a long time, it will finally be practical. Potentially, it’s usable fairly soon. Whether that’s five years or ten years [from now] is quite difficult to predict at the moment, when the next big jump is going to happen.