Group Leader, functional genomics and bioinformatics
Name: Edwin Cuppen
Title: Group Leader, functional genomics and bioinformatics, Hubrecht Institute, Utrecht, the Netherlands, since 2005
- Professor of genome biology, Utrecht University, since 2007
Experience and Education:
- Junior group leader, Hubrecht Laboratory/Netherlands Institute for Developmental Biology, 2002-2005
- Postdoc in functional genomics, Netherlands Cancer Institute and Netherlands Institute for Developmental Biology, 1999-2001
- PhD in cell biology, University of Nijmegen, 1995-1998
- MSc, molecular sciences, Agricultural University of Wageningen
Since 2002, Edwin Cuppen has been running the functional genomics and bioinformatics laboratory at the Hubrecht Institute in Utrecht, the Netherlands, formerly known as the Netherlands Institute for Developmental Biology.
His group has been making gene knockouts in a variety of different model organisms. For example, over the last few years, his group generated almost 200 zebrafish knockouts for outside research groups, partly in collaboration with the Wellcome Trust Sanger Institute, which took on the sequencing portion of the effort.
Cuppen’s group was also one of the first to generate knockouts in the rat. Earlier this month, Cuppen gave a talk at a conference in Bielefeld, Germany, about a pilot project that involved Applied Biosystems’ new SOLiD sequencer, which he hopes to buy this year, to identify successful gene knockouts in C. elegans more cheaply and quickly.
In Sequence asked him last week about the results of the project, and about his views of massively parallel sequencing technologies.
Tell me about your gene knockout technology and the SOLiD sequencing pilot project.
What we have already done for several years is to make gene knockouts in various model organisms. The traditional way of making genetic knockouts in the mouse is to use embryonic stem cells and do homologous recombination. Unfortunately, that technique is limited to the mouse, so we have to resort to other options to get genetic knockouts in the other model organism that we are using.
We actually use a brute force approach, where we use a chemical mutagen to damage the DNA and introduce random mutations in the sperm cells. Then we set up a cross with a wildtype animal, giving you progeny with heterozygous mutations present randomly in the genome. We generate a large population of such F1 animals, and we isolate a bit of DNA from those that we use for PCR-amplifying the genes that we are interested in. What we did so far was use dideoxy sequencing to sequence their exons and look for induced mutations. What we hope to find is mutations that introduce a premature stop codon.
That is a technique that we showed to work about five years ago in zebrafish. We also do this in C. elegans now; we just published a paper on that in Genome Research (2007 May;17(5):649-58. Epub 2007 Apr 6.). We also did this in medaka [Japanese killifish] last year. This is also the only technique that works in the rat for making knockouts. We made about six rat knockouts so far, and we are currently in the process of making even more. So it’s a universal approach to generate knockouts in model organisms, both animals and plants.
This approach, currently, is limited by the identification of targeted mutations. We first have to PCR out our targets, the open reading frames of the genes, and then sequence them on an individual basis, or single amplicon-single animal basis. We thought that this could be potentially boosted if we used massively parallel sequencing, because you just want to find mutations in those regions. That’s why we set up a pilot project involving SOLiD sequencing, using some validated samples, which we had already screened using dideoxy sequencing, from the C. elegans library that we made. We took a selection from those, and we took some new amplicons and new samples along as well for doing discovery.
We took 3 pools of 100 worms, amplified out 80 different amplicons of average size 300 to 400 base pairs, and analyzed these in a run that used part of the capacity of the SOLiD sequencer. We did this in collaboration with Applied Biosystems. This was to try to find the existing mutations and to identify novel mutations. We were rather successful in finding the vast majority of the mutations in this setup, and we did identify novel mutations that we could confirm by traditional dideoxy sequencing.
The major challenge in this project was that we were looking for alleles that have a frequency of less than 1 percent in the population, because there were 100 heterozygous animals. With conventional sequencing, this is of course not possible to determine in the same run. If you look at other platforms, they have shown that you can see allele frequencies of 1 percent. But the fact that you can see them does not mean that you can routinely find all mutations that are less than 1 percent frequent, and that’s kind of the limit that we were pushing in this pilot.
How did you distinguish between the samples? Did you use barcoding?
No, we did not do that at this stage. The idea is that once we find something, it’s pretty easy to take the 100 animals and genotype them for that specific polymorphism, because we know which mutations would be in there. And genotyping 100 samples costs just €10 [$14] or something. We don’t care at this stage, but it’s also possible to do smart pooling, and splitting up the flow cell.
What did you learn from the pilot study?
We did not find every known mutation. And we do feel that we kind of pushed the limit by putting 200 alleles together. We should have done fewer alleles and more amplicons, for example take 25 or 50 animals together in one pool and increase the number of amplicons per sample.
I do believe that this is not platform-specific. This is general to all the massively parallel sequencing platforms. Because it’s fighting low-frequency real mutations vs. error rates, and error rates of all the massively parallel sequencers are much higher than we are used to in dideoxy sequencing.
How could massively parallel sequencing eventually improve your process?
One thing that we will focus on in the next few years is to make this more efficient. We are pretty efficient, but we have been able to make only 100 zebrafish knockouts in the last year. And of course, if you want to go for 30,000, that would take too much time. So we have to improve on the technology, and the massively parallel sequencing could be one of the options. We are exploring others as well, but this is one.
What are the other options?
This is an idea that we postulated in our recent paper on C. elegans. We are interested in mutations that introduce a stop codon and make a knockout. Actually, there are only a limited number of positions in the genome that can change into a stop codon. That depends on the mutagen that you use. In C. elegans, that’s pretty simple, it’s EMS, a mutagen that always changes a G to an A, or a C to a T. If you put that spectrum on the open reading frame, you can identify every position that can become a stop codon. We called that part of the genome the “mutome.” The worm genome is 100 million base pairs, the transcriptome of C. elegans is 25 million base pairs, but the mutome is only half a million positions. And the only thing you have to do is look at those half million positions. Any genotyping technology potentially could do that, so we are exploring genotyping technologies for this purpose as well, like Illumina’s or Affymetrix’s. We got some promising results from those as well.
So you’re not sure yet which technology you will eventually use?
For this application, we do not know. In the end, it comes down to cost.
Why did you choose ABI’s platform over Illumina’s or 454’s for your pilot project?
When you start pooling, and you do a lot of samples, you need very high coverage, you at least need 1,000 to10,000X coverage of your region, so you need a lot of reads. Primarily for that reason, the 454 system is kind of suboptimal because it’s limited in its throughput and number of bases that it can read per run. I did a calculation for that platform, and it’s not cheaper than what we currently do. That is just because of the number of bases that you get and the cost per run.
Then more reads would be beneficial, so both Solexa and Applied Biosystems potentially could do this job better because they generate more bases per run. Why did we choose Applied Biosystems? As I said, we are fighting low frequency real mutations vs. error rates. And the two-color coding that is used in the SOLiD system, where every real mutation is confirmed by two different observations in color space, as they call it, that helps to reduce the noise of the system and to increase the discovery rate of real mutations. That theoretical consideration, I think, is very important for mutation discovery. I do believe that the combination of many reads and this two-color coding is a very good combination for mutation discovery and detection.
How did this collaboration come about, and when did it start?
We are not a large sequencing center but we have this specific application of mutation discovery and for that, we do a lot of sequencing. We have two 3730xl machines here that run almost around the clock every day. We have a unique position worldwide in doing this specific application in various model organisms. Because we have been in good contact over the years with Applied Biosystems for the traditional sequencing technology, we came to talk about this project and that it would be interesting to set up a project and see what high-throughput sequencing could do for this.
We started discussions for this experiment last fall, and experiments around Christmas. The actual sequencing on this project was done sometime in March. This was with the early system; there have been quite a few improvements, and I have seen later data that we played around with that improved quite a few things as well, both in throughput and mutation calling.
Are you planning to acquire a SOLiD sequencer now?
Yes, acquiring such an instrument is definitely one of the first goals, hopefully still this year. This is just one application that we do in our institute, but of course we all see that this next-generation sequencing is actually an enabling technology. It makes a lot of other experiments possible as well, ranging from chromatin [immunoprecipitation experiments] to looking at small RNAs, which we currently do in collaboration with Applied Biosystems. We look at all those applications, and we do feel that platform could very well fulfill the sequencing needs that we have over the next few years.