Name: Ivo Gut
Position: Associate director, leader of the SNP genotyping and technology development groups, CEA/Centre National de Génotypage, Evry, France, since 1999
Experience and Education:
Team leader, Max-Planck-Institute for Molecular Genetics, Berlin, 1996-1999
Research fellow, Imperial Cancer Research Fund, London, 1993-1996
Research fellow, Harvard Medical School, Harvard University, 1990-1993
PhD in physical chemistry, University of Basel, Switzerland, 1990
Undergraduate degree in chemistry, University of Basel, Switzerland, 1985
As associate director and leader of the SNP genotyping and technology development groups at the Centre National de Génotypage in Evry, near Paris, Ivo Gut is responsible for the center’s second-generation sequencing platforms.
Last year, he also became coordinator of a European Union-funded research consortium, READNA, that aims to develop new DNA sequencing technologies (see In Sequence 12/9/2008).
Last month, In Sequence spoke with Gut about his work at CNG, which is devoted to developing and applying genotyping and related genomic technologies, and his view of the new technologies.
Can you give some background information on the Centre National de Génotypage?
The CNG is one of the biggest genotyping institutes in the world. I think out of all the genome-wide association studies that have been run worldwide, about 20 percent or so were done at our place. We do genome-wide genotyping at a rate of 2,000 samples a week. We also hold about 250,000 DNA samples and have large quality-control facilities for handling, storing, and plating DNA.
We have six genotyping platforms. We are the Illumina user with the highest throughput in Europe, and then a lower degree of genome-wide studies are run on Affymetrix. We also use Sequenom and TaqMan, and then we have a few nifty additional technologies to do things that you cannot do on commercial systems.
Two years ago, we realized that with genome-wide genotyping, we are not going to get all the information we would like to get, and that we actually need to sequence. We have always been doing a limited amount of technology development, but we knew that we had to go towards genome-wide sequencing. That’s why we were motivated to set up and coordinate this EU consortium, READNA.
Then a year ago, the CNG integrated into the French atomic energy agency, CEA [Commissariat à l'Energie Atomique]. About 10 percent of CEA is dedicated to life sciences, and it has a lot of expertise in very high-volume computing — some of the biggest computers that exist in Europe actually belong to the CEA.
For us, this is very interesting, because on top of having expertise in nucleic acid analysis, we now have, through this integration, access to specialists in all sorts of fantastic technologies — including some stealth technologies for military applications —and we can explore opportunities to apply this for our needs, nucleic acid analysis.
For example, we are collaborating with a group of informatics people who specialize in image analysis technologies and treating very complex signals. This is interesting for many partners within the READNA consortium, for the people who use, say, nanopore technology to read out the DNA. These guys don’t flinch at data being processed by computers at phenomenal rates.
[ pagebreak ]
What kind of second-generation sequencing instrumentation do you have at your center?
We currently have four Illumina Genome Analyzers, and we have a Genome Sequencer FLX on order. We will probably order another couple of Genome Analyzers, and probably I’m also trying to get an ABI SOLiD instrument. But it’s not that we just want to try all of them. I used to say that for a long time when I was talking about genotyping technologies, ‘You need the best genotyping technology for the particular problem.’
I used to put up this slide in presentations, where I showed three cars, a Formula One race car, a rally car, and a tractor. And then I would say, ‘Decide which is the best car.’ And then I’d show a Formula One track, the picture of a desert, and a potato field. And then I’d say ‘Now think again about what you thought a moment ago,’ because it completely changes. A Formula One car on a potato field is going to be good for absolutely nothing. It’s really that you have to think what your problem is, and you have to choose the right tool to solve your problem.
For example, when you try to do whole-genome sequencing, the GS FLX can give you a little bit of added tweaking of your final sequence. You don’t have to generate huge coverage with it, just very little. And then we think that some applications might actually do better on the SOLiD instrument than on the Illumina instrument. There is one particular problem that I’m thinking of, and that is the directionality of RNA. Applied Biosystems is telling us that they have solved the problem of providing the directionality of the RNA molecules in RNA-Seq, and this is a very critical issue. Regarding error rate, in RNA-Seq experiments, the error rate is not such a big player. It does play into de novo sequencing vs. resequencing.
But it looks like the read length of the SOLiD is going to be shorter than what you get from the Illumina instrument, and read length is important. We are now members of the International Cancer Genome Consortium, and to us, being able to read 100 bases rather than 40 bases is critical.
Are you involved in any technology development for second-gen sequencers?
Through READNA, we held a closed workshop in January, where we got Roche, Illumina, Applied Biosystems and some of the companies who do enrichment around a table, as well as people involved in standardization efforts for sequence data. We are going to try to set up standards that define quality control in conjunction with genome enrichment.
When you use a second-generation sequencer, you bias your results, because you use a reference sequence to align the reads. And then when you use an enrichment tool, you bias your results again, because you use a reference sequence to build your enrichment base. So you have two biases on top of each other, and then, how do you know that you are doing the right thing when you read out the result? To me, that is a real issue, so we are going to try to come up with some guidelines for how the quality measures should be done. And then we are probably going to suggest a comparison experiment where we compare different manufacturers of array enrichment methods and cross that with a certain type of sequencer, and look at the results, and try to bring quality metrics into how well this is actually doing.
I’ve heard several companies selling enrichment methods come and say, ‘We can do this, let’s try this together,’ and then when I said, ‘OK, let’s go ahead,’ then they pulled back and said, ‘The protocol is not quite ready, and we don’t know whether we have to do this, or that.’ I think everyone is running into the same kind of difficulty with it. On paper, it sounds like a really simple idea, and when you actually try to do it, then it’s slightly more of a challenge than people actually would admit.
Can you mention some projects you are working on at your institute that involve second-generation sequencing?
The Institute National du Cancer, INCa, has committed itself to the International Cancer Genome Consortium, and we will do the sequencing for that. We will be working on Her2-positive breast cancer and alcohol-induced liver cancer — those are the two cancer types we are currently signed up for. We are getting the samples at the moment.
We are meant to analyze 500 samples. But with the amount of funding that’s available for each cancer type, at current prices, you can’t do complete sequencing. So we are going to do the complete sequencing on a subset, and then the exome only on the remainder. However, the cost of sequencing is bound to decrease and we should be able to do all samples with complete sequences.
We also do resequencing of our association peaks from genome-wide association studies, using the Genome Analyzer. We resequence quite huge chunks of DNA from that, genomic regions up to about a megabase of DNA, and then resequence that in 50 individuals or so. The enrichment is done by brute-force long-range PCR.
And then we run a lot of Chip-Seq and a lot of [methylated DNA immunoprecipitation]-Seq type experiments as part of collaborations, and we have done some bacterial sequencing as well.
For your cancer-related work, are you planning to sequence entire cancer genomes, or selected regions?
We want to do complete ones. What we are asked to do in the context of the International Human Cancer Consortium is to resequence 35 megabases of exome per sample. But for our own work, we are actually going for complete genome coverage. Basically, we want to sequence the DNA from the tumor tissue and the peritumoral tissue, and, on top of it, we want to generate RNA-Seq profiles and MeDIP-Seq data from the tumor and from the peritumoral tissue.
We were also a founding partner of the Human Epigenome Project, and we have done quite a lot of work on epigenetics. So to us, this is really just an extension of what we have been doing anyway and applying it in a most systematic way.