Director of operations
Human Genome Sequencing Center, Baylor College of Medicine
Name: Donna Muzny
Position: Director of operations, Human Genome Sequencing Center, Baylor College of Medicine, since 2001
Experience and Education:
— Different positions at Baylor College of Medicine, since 1986
— MS, Genetics, Texas A&M University, 1986
— BS, Biology, Texas A&M University, 1982
Donna Muzny has been with Baylor College of Medicine’s Human Genome Sequencing Center since it was founded in 1996, when the National Human Genome Research Institute chose it for one of six pilot programs for the final phase of the Human Genome Project.
More recently, as director of operations, she has helped the center test and implement new sequencing technologies from 454 Life Sciences, Illumina, and Applied Biosystems.
In Sequence spoke with Muzny at the Biology of Genomes Meeting at Cold Spring Harbor Laboratory last week about the challenges of adopting new technologies, and how the center is using the new platforms.
How was the sequencing center equipped when you started back in 1996?
When we first started, we were a very small lab. We started with a couple of [ABI] 370s, where you could only run 16 lanes on a polyacrylamide gel, and you had to wait for an hour and a half or so to see the primer come through, and then track each lane. That was how it started with fluorescent sequencing.
How is the center equipped today?
This year, we have been doing less and less Sanger sequencing and investing more in the new technologies. We are now at 10 454 GS FLX sequencers, and we have two Solexa instruments [from Illumina] — a Genome Analyzer and a Genome Analyzer II. We also now have five Applied Biosystems SOLiD instruments on site.
We still have 3730s, but we have dropped about half of our fleet. I believe there are 35 or 36 instruments right now at the center.
It seems like Baylor has made a slightly different choice of the mix of next-generation sequencers than the other large sequencing centers, which have invested heavily in Illumina GAs. Why is that?
I think we had planned to do a bit more de novo sequencing in the beginning, so we went with 454. We have had excellent success using 454’s platform for microbial assemblies and de novo sequencing. And the impact that has made has put us in a good position to do the capture strategy now, using the 454 technology.
How did you test the different sequencing technologies, and what are their individual strengths?
We have three microbial genomes, with different GC contents, and a set of rat BAC pools for testing. The accuracies between the different platforms were pretty similar on the microbial genomes. I think the strengths and weaknesses of the platforms related to GC content. Assembly issues also contributed to how well the platforms did. I think they were all pretty comparable, with certain advantages determining whether or not they are going to be used in future projects.
What were the greatest technical challenges in implementing these systems?
Now that we are ramping to 20 runs a day with the 454, there is a lot of development of a pipeline [needed]. You have to get the libraries and emulsion PCRs ready at certain times, you have to track how much you recover from an emulsion PCR, and then of course you have to schedule and run the instruments. And of course it involves dealing with the software and the analysis offline. I think it is also a challenge to bring in the appropriate talent that you need, both on the bench and management-wise. With all the technologies, there is an issue of project management, where you have to understand the different types of applications and what is expected. For example, you might be doing all three platforms for a project and combining all the data. Keeping track of all that and pulling those projects together is a hard thing. And that part of it is growing.
I think each platform has had its own issues coming through the door, which you had to kind of work through. I think each platform has had its technical difficulties, its software issues, and performance issues. But because we get the instruments before they are out and available, we kind of expect a certain amount of being a guinea pig on these platforms, and having the patience to work through them. And we learn a lot in that process. It’s like having to make your own dNTPs, measuring them out, and learning the ins and outs of basic sequencing, rather than been given a kit to do something. I think overall, it’s been a really good experience, because every problem that you have, you learn something from it, [and] you understand the system a lot better.
Can you mention a few projects, and how you use the new sequencing platforms in these?
For the 1,000 Genomes Project, we are doing parts of all three pilot projects with 454, and on the SOLiD, we are doing parts of pilot 1 and pilot 2. We plan to generate about 100 gigabases of sequence data on the 454 platform for this project, and about 200 gigabases on the SOLiD platform.
Pilot 1 involves sequencing 180 people at low coverage — It was 2-fold, but I believe it has been bumped up to 4-fold [for one set of samples]. Pilot 2 is sequencing two trio sets with deeper coverage, and pilot 3 involves directed sequencing of a set of exons in about 1,000 people.
We are also involved in the Human Microbiome Project. For de novo microbial sequencing, we use 454 because it has a longer read length, and we use either the SOLiD or Solexa to pretty much fill in the rest. Primarily, the errors on 454 are indels, and the errors on the SOLiD and Solexa are primarily substitutions. So those two types of technologies really work well in a mixed platform.
We have also done quite a bit of special projects on the Solexa platform, like methylation studies, cDNA, ChIP-Seq experiments, and microRNAs, so more the [kinds of projects] that the short reads are very good at.
For the medium-size genomes, like insect genomes, I think the 454 platform will also be very useful, especially the upcoming XLR [extra-long] reads.
As we work on whole-genome mammalian assemblies, I think the new technologies will have a huge role in either upgrading existing mammalian genomes that we have done light coverage on [by Sanger sequencing], or possibly doing a whole primate for comparison [with humans].
For example, we are currently combining a low coverage of Sanger sequencing with 454 sequencing to do genome upgrades. We have 1,000 BACs in the pipeline for the rat genome, and we finished 200 of those already. That will be directly applicable to the genome upgrade of the rat.
In the next year or so, what technical improvements are you planning to work on?
We are really working on the NimbleGen capture methodology. We brought the technology in house and we have trained at NimbleGen, so they make the chips and we do the hybridizations in house now, and that’s going pretty well. We hope to be participating in a number of projects where targeted sequencing is the mission for functional mutation discovery. You can see it in the 1,000 Genomes Project, but probably also in some of the cancer projects coming up. Potentially, we are going to use the capture technology for an autism project and an ion channel project. We have plans to look at the whole exome as well, and we are very excited about that aspect.
For the 454 platform, the XLR [reads] are coming out, and that’s looking really good. Also, our latest run on the SOLiD instrument was just really excellent; we managed to get 11.6 gigabases off of one run, and we were very pleased with that. I think all three vendors are really working towards improving their platform, improving their chemistries, etcetera.
I think the upfront parts of the process are the ones that are really going to be the bottlenecks, because the library protocols right now are not easily automated. We need methods to make that a smoother process, a process that is less labor-intensive but also less finicky. It all has to do with how you prepare the libraries, how you prepare the fragments, pulling them off the capture, doing amplicon sequencing, doing cDNA sequencing. They are all upfront types of work that you have to perfect. That’s going to probably be our next focus, to make that whole process a lot more efficient.