Johan den Dunnen
Leiden Genome Technology Center
Name: Johan den Dunnen
Position: Head, Leiden Genome Technology Center; associate professor of human and clinical genetics, Leiden University Medical Center, the Netherlands, since 1992
Experience and Education:
— Postdoctoral fellow, department of human genetics, Leiden University, 1986-1991
— PhD in molecular biology, Katholieke Universiteit Nijmegen, 1987
— Undergraduate degree in biology, Katholieke Universiteit Nijmegen, 1981
Last week, researchers at the Leiden University Medical Center in the Netherlands said they had sequenced the genome of one of their colleagues, a clinical geneticist named Marjolein Kriek, to eight-fold coverage using an Illumina Genome Analyzer. The sequence data was generated as a “side project” at the Leiden Genome Technology Center, one of the first users of former Solexa’s sequencing technology.
In Sequence caught up with LGTC head Johan den Dunnen last week to discuss the status of this project and to talk about how second-generation sequencing is used at the center.
How did this project come about? Why did you decide to sequence a human genome?
It was just to see what’s possible at the moment. People have been saying that we can soon sequence a human genome. We said, ‘We can do it now,’ and that’s what we did.
We have an Illumina Genome Analyzer at the Leiden Genome Technology Center, which we obtained in December 2006 in an early-access agreement with, at that time, Solexa. As far as I know, we were one of the first two sites in Europe to obtain the system.
We did this as a side project to test the system that we acquired. Whenever we needed to run test samples, or to test a new improvement of the system, we ran this specific DNA sample. We wanted to find out how we could do such a project technically, computationally, and analytically.
How much did the project cost, and where did the funding come from?
Reagents and consumables cost approximately €40,000 [$62,000]. We have in our facility some funding to acquire new technology and test it out, which we used for this project.
Did you use paired-end reads?
We can only do unpaired reads at the moment. We said very early on that we would be very eager to be early users of the paired-end technology, and we have tried to obtain a paired-end module since last October, but it is not available yet.
We would have gotten more out of the data than we have now with paired-end reads, of course. Without paired-end reads, we cannot analyze structural variants, such as insertions, deletions, inversions, and duplications, which are of interest in a clinical setting.
According to the press release, you generated 22 gigabases of sequence data so far. Are you planning to increase this coverage?
We are in the middle of analyzing the data at the moment, which came from about 15 runs on the instrument, which generated 1.5 gigabases of data per run on average. But as soon as we get the paired-end capability, we need to test this, and we will again use this sample and increase the coverage even further. However, we will not go over 20X coverage.
Who is going to perform the bioinformatics analysis?
When we did the project, we saw that technically, generating the sequence data is not a problem. However, with all the computer and data storage requirements, it’s at the limits of what a small facility at an academic hospital can do. Computationally, we are doing our best, but we have asked for help from the Wellcome Trust Sanger Institute and Illumina.
Are you going to place the sequence data in a public database?
Yes, as soon as we have done the bioinformatic analysis. We expect that near the end of the year, the data will be made public. Of course we would also like to publish the results in a journal.
What kind of information are you going to withhold?
That depends on Marjolein Kriek, whose genome we sequenced. At the moment, she has not indicated any regions that she would not be willing to make public.
What are you hoping to learn from this project?
The main interest, in our case, is ultimately to apply human genome sequencing to find causes of genetic disease.
You need to align the data to the reference genome, find variants, and then for the variants you find, you need to qualify whether they are potentially pathogenic or not. We consider that the most difficult part of the project. It’s not so difficult to do the sequencing and to call the variants, but then each variant — and it might be up to a million per patient — needs an answer with regard to the disease of the patient. Right now, there is no tool available where you can submit a million variants and then get back a result.
One project that is trying to address this is the Human Variome Project, which just had a planning meeting. The project, which has no funding at the moment, was put forward by a group of interested researchers who met for the first time in Melbourne in 2006, and is led by Dick Cotton, a professor at the University of Melbourne.
The project will focus on all kinds of genomic variation, but one outcome would be a tool which you can use to separate non-disease related variations from disease-related ones. For example, it would include all SNPs that come out of genome-wide association studies.
Are you planning to sequence more human genomes after this?
In cases where we have too few family members, and linkage association studies are not possible, we need to inquire the entire genome. In the future, we consider whole-genome sequencing as an option to find the pathogenic variant that affects a disease. But we know that today, we can get the sequence but not the answer yet.
In the near term, we might focus on X-chromosome-linked diseases. Our department has always been interested in X-linked diseases, and we have some examples of such diseases. With flow-sorting, we could pull out the X-chromosome and sequence it. That would make our job quicker, because you only sequence the X-chromosome, and also, you have fewer variants to worry about.
Are you involved in any other research studies that involve whole-genome sequencing?
Not at the moment. We might get involved in the 1,000 Genomes Project and do some anonymous Dutch genomes. The goal of that project is to study variations in healthy people, and we are certainly interested in doing that.
Are you planning to acquire any more next-generation sequencers?
Because the Illumina system was fully booked quickly after it was installed, we bought a second system last month, a Genome Analyzer II.
Did you also consider alternative systems, like the Applied Biosystems SOLiD?
We looked at the SOLiD, and it’s a great machine. In general, our facility always tends to purchase an alternative system when we buy a second one, to offer our customers a choice. In this case, because the pressure was so high on obtaining data from the Illumina machine, we decided to go with the same system again, because you don’t need to learn anything new. That was, in fact, the reason. If we had obtained the SOLiD, we would have needed to train somebody to learn the sample preparation, learn the technology, et cetera. That means it would have taken three to six months before we could have started generating data. And that was not wise when so many people were waiting for their results.
What has your instrument mostly been used for?
Because it is in a facility, it is mostly used for microRNA profiling, ChIP sequencing, and gene expression profiling using the SAGE/tag-like approach. A lot of people are doing microbial genome sequencing — partly resequencing, but partly also de novo sequencing. The biggest problem for the de novo sequencing is good assembly software. The bigger the genome, the more problematic the assembly becomes.
How much bioinformatics support do you provide?
We are only very limited. We have somebody who provides a pipeline for the data analysis but that goes up to a simple level. It’s up to the customer to get specific answers from the samples.
How large is the Leiden Genome Technology Center, and what other services do you provide?
In the LGTC, we have currently six people employed taking care of clone libraries, DNA sequencing including ‘normal’ sequencing, pyrosequencing, and Solexa sequencing; array technology, including home-spotted printing; Affymetrix, Illumina, Agilent, NimbleGen, and FlexGen; PCR-based SNP-typing, including arrays; TaqMan, BeadXpress, Fluidigm, Idaho, and Roche; sequence variant detection by melting curve analysis, et cetera. In all cases customers can hand in their sample and we perform all the work, or they come to the facility just to perform endpoint analysis, for example an array scan.
Overall, we are not a ‘typical’ Dutch core facility, since we are very fortunate to have many more systems available than others, offering a choice of platforms per application, and typically we are the first in Holland to acquire a specific technology or system. At the moment, for example, our Fluidigm and FlexGen systems are unique in Holland, and maybe even in Europe.
We collaborate on most services with ServiceXS, a commercial service provide in Leiden.
Why did you decide to publish a press release on the human sequencing project, even though it is not finished yet, and what was the response?
The analysis is not finished, but we looked at a lot of things already. Last week, there was an annual meeting of the press with scientists in Holland, where we talked about this work in public for the first time. We got an enormous amount of interest. It is the first woman, and the first Dutch person, to be sequenced, so it made the future come very close for people in Holland.