Name: Hans Lehrach
Position: Director, Max Planck Institute for Molecular Genetics, Berlin; head of Department of Vertebrate Genomics, since 1994
Experience and Education:
Head of Department of Genome Analysis, Imperial Cancer Research Fund, London, 1987-1994
Group leader, European Molecular Biology Laboratory, Heidelberg, 1978-1987
Research fellow, Harvard University (Paul Doty's group), 1974-1978
PhD, Max Planck Institute for Experimental Medicine and Max Planck Institute for Biophysical Chemistry, 1974
Diploma in Chemistry, Technical University Braunschweig, Germany, 1970
Hans Lehrach heads the Department of Vertebrate Genomics at the Max Planck Institute for Molecular Genetics in Berlin and is involved in many projects — including the 1000 Genomes Project — that make use of the institute's 14 second-generation sequencing instruments. In Sequence visited Lehrach earlier this month to discuss some of his work, in particular how predictive models combined with sequencing data could improve cancer therapy. An edited version of the conversation follows.
Tell me about your involvement in the Human Genome Project, and other genomics projects that followed.
I was very early on involved in discussing the sequencing of the human genome. I was at the first Santa Cruz meeting [in 1985], where a very small group of aficionados discussed sequencing the human genome, at a point when it sounded a bit like science fiction.
We were [also] involved in a lot of functional genomics, developing new techniques. In 1987, we built the first arraying robot [at the Imperial Cancer Research Fund] in London and developed a lot of the machinery and materials which people still use today. High-density microtiter plates, high-density arrays, clone-picking robots, and other machines that we developed were then commercialized and bought by many genome centers and companies.
We were [then] involved in sequencing the human genome. We were fairly central in the work on chromosome 21, where the mapping was coordinated by Marie-Laure Yaspo from this department [here at the MPI in Berlin]. We also did quite a bit of sequencing. [Chromosome 21] was the second chromosome that came out in the Human Genome Project — the first one was chromosome 22, which the Sanger [Institute] did.
[ pagebreak ]
We have [since] been involved in a number of genome sequencing projects, and we started, very early on, to take advantage of some of the new technologies. We had one of the first Illumina [Genome Analyzers] in the world here. We had a paper on RNA-seq fairly early on [and] we have been using these machines for many applications quite extensively.
How are you currently equipped with second-gen sequencing technology?
At the institute, we have 5 Illumina [Genome Analyzers] at the moment, we have 5 [Applied Biosystems] SOLiDs, three 454s, and one Polonator.
At the moment, we are probably the second-largest center for second-generation sequencing in Europe, behind the Sanger Center. A lot behind the Sanger Center.
What are some major projects you apply these new technologies to?
The main project is a cancer project, Treat1000, and similar projects for characterizing the biology of tumor samples, [such as] the Mutanom project [which is funded by the German Federal Ministry of Education and Research].
[We are] trying to characterize tumors vs. normal tissue, model the biology of the tumor and the normal tissue, and processes like pharmacogenetics, and use this information to generate a virtual patient on which we can try drugs to optimize the therapy, to eliminate drugs [that] have only negative effects on the patient, and to also help to identify new, more targeted drugs.
A second [project], in which we also use deep sequencing very successfully, is for analyzing complex genetic crosses. For example, in a project with Bernhard Herrmann [here at the institute], we are trying to look for modifiers of intestinal tumors in mice, but also for modifiers of, for example, methylation or expression or splicing patterns, both in the tumor and in the normal tissue. This project we also carry out in collaboration with Jiri Forejt [at the Academy of Sciences of the Czech Republic] in Prague, where we use chromosome substitution strains to characterize genes that affect processes like expression, transcript processing, methylation, but also phenotypes, for example, tumor formation.
In the mouse, we can obviously do very complex genetic experiments. In combination with these new tools in genomics, there is an enormous amount we can learn. If we get any pathways out that are new, we can put those into models that we use on human patients and, hopefully, get better and better models, which would be able to predict better and better how a patient will react to specific treatments.
We use deep sequencing in [many] other projects, for example, to study the evolution of deuterostomes, simple organisms like Ciona or sea urchins.
[ pagebreak ]
We are also participating in the 1000 Genomes Project, where we are the only European center except for the Sanger [Institute].
[Also], a German component of the International Cancer Genome Consortium has just been announced, and we are writing an application at the moment, together with Peter Lichter [at the German Cancer Research Center]. We will focus particularly on transcriptome characterization, but [want to] participate in other things as well. There is a possibility of additional [German] components of the ICGC.
We are also working on early proof-of-principle experiments ... where the goal would be to sequence the genome of specific patient groups. We have one paper in preparation on sequencing 10 Alzheimer cases, using deep sequencing of the exome, in this case using NimbleGen enrichment and 454 sequencing.
Can you tell us more about Treat1000?
We are convinced that the combination of deep sequencing and predictive models will help us to treat tumor patients much better than the standard [treatment] at the moment.
At the moment, tumor patients, by and large, are treated by some first-line therapy, approved for one particular type of cancer, but very often without much regard for the individual biology of the tumor, the individual mutations, the expression changes, [or] the copy number variation.
Treat1000 is a first step to try to apply the combination of deep sequencing and modeling to 1,000 patients, to formally prove that this will give a much better choice of therapies, since this will lead to much more individualized therapy. [It] will help patients ... suffer fewer side effects, maybe even reduce the cost of treatment, and also help to develop new drugs, for example, by stratifying clinical trials, or using [the information] early in the drug-development process.
The goal is to use living cancer patients, not to use historical samples where the patient doesn't profit from the analysis, but to do the analysis on patients who are still under treatment, in some cases beyond the classical treatment. Patients are very often already at a very late stage in which clinicians don't have any options left, so the hope would be that we can identify drugs or drug combinations, which, even at that late stage, can help the patient.
Does this also involve new drug development?
I think in some cases, you might be able to achieve quite a bit by just being able to combine available drugs in new ways. I think it's also going to involve new drug development because cancer is not one disease, it's many. In a sense, we should treat cancer as a collection of orphan diseases, each of which requires, maybe, a specific set of therapies or combination of therapies.
[ pagebreak ]
Ideally, we should be in a situation in which we get the sequence of a patient, model therapy effects, and give the oncologist the possibility to prescribe a specific combination of components that have the best effect and the minimum side effects. It could be different for every individual patient.
How far has Treat1000 progressed?
We are just completing the sequencing of one patient [with metastatic melanoma] from Treat1000, and are using that to model the biology to generate a virtual patient. Even the early results have led to modifications of therapy.
We have the next few samples in the works — we have a few melanoma and colon cancer samples, [though] we might include other cases as well, [such as] prostate cancer. As we analyze the results, I think we will be better able to choose candidates.
This is early days — we are still in the process of putting all the information into those models, going systematically through all possible drugs, exploring the effect on this patient, maybe testing some of those drugs on cell lines derived from the patient, [or], if possible, tumor stem cells derived from the patient.
What is the approach you are taking with regard to sequencing these samples? How are you doing that?
We are using a combination of whole-genome sequencing, which at the moment, we are doing mostly on the SOLiDs.
We have been using deep sequencing of the exome, enriching for many of the exons in the genomic DNA, and we are doing transcriptome analysis. We are trying to do microRNA analysis as part of the transcriptome.
[We] do that on the tumor, [and] at least part of that also on [an enriched] tumor stem cell population [that has] been isolated from the tumor, and on [normal] DNA [from blood] to identify which of the SNPs we find are pre-existing in the germline.
We are [also] working on techniques with which we should be able to sequence large numbers of individual cells by some multiplexing strategy. Hopefully, we should be able to get enough sequence out of each individual cell to be able to group them. Then, we have a deep sequence of a group of cells. Not necessarily a deep sequence of each individual cell, but if one-third of the cells are biologically equivalent, and we can pool the results on 300 cells, I think this would help a lot to be able to deal with admixtures by normal tissues, or subgroups of cells with different mutation status. But that's another stage. I'm sure it will be doable, but we will try the somewhat simpler things first.
It's quite amazing that we are now in a position to probably generate as much sequence information as we generated in the human genome project in a single patient. It took 10 years and many groups, and roughly $1 billion to do [the human genome]. Now, the cost of sequencing has dropped dramatically, and I don't see an end to this process.
How do you want to use all that sequence information?
I think it's an unavoidable conclusion that we should use this information in medicine. And the only way I can see to use it is models. There are two basic approaches you could think about doing these things: one is statistical correlations — basically, you don't know anything, you just try to find out if you always find a specific set of features whenever you find a specific phenotype response. For example, the tumor is aggressive, or treatable by a specific therapy, and you have a similar expression pattern, or a similar mutation pattern.
The problem is that those patterns are only easy to establish if you look at very large groups of very similar patients. To do statistics, you need a large number of samples, and if those samples differ in important aspects, you can be easily misled. So if you do a pattern on the expression analysis level, but you ignore mutation data, or CNV data, then your patterns are going to be much weaker, so you may end up with a pattern that you find in Turks in Berlin, which doesn't replicate in Italians in Munich.
It is a powerful strategy, but it is less powerful than predictive models, because the conclusions we draw are to a large extent based on pathways that are conserved down to mice and even flies or yeast. Therefore, the probability that a group in Munich and a group in Berlin or in China are going to show different patterns is pretty small. I think causal modeling gives us a chance to look at individual patients, take all the information we can get, and predict, for example, his therapy response. Whenever you can rely on information that has been described in hundreds of publications, in years of cancer research, you are on very solid ground in predicting a very individual response.
What about new pathways that haven't been discovered yet?
I think there will be new pathways discovered, but it would not be a glorious success story for the war on cancer if the best we can do at the moment to treat patients is to throw away everything we have learned and look for patterns in gene expression, which we could just as well do on some strange new form of beetles to predict their mating behavior. In patterns, we don't take advantage, by and large, of this information we have accumulated. Obviously, we don't know everything, but we know some things, and we should take advantage of that.
Obviously, there are many things in which we don't know enough about pathways in which we should rely on statistics, or maybe things like neural networks or genetic algorithms, to link phenotypes to the molecular description. But I see statistical patterns as a way to expand from our solid knowledge into the unknown. It's not the optimal way to go the whole way because we throw away a lot of information that is extremely useful.
[ pagebreak ]
There are a lot of people who do classical cancer research, hypothesis-driven analyses of single genes, and there are a lot of people who try to find patterns, which completely ignores [prior knowledge].
What we bring together is very exhaustive characterization of the patient biology, which we can do by deep sequencing at lower and lower cost. And the models that we can then build on the information that we have about pathways and effects of mutations, they can go beyond that.
What do you think about the coming generation of sequencing technologies?
I will try to use them as soon as they become available. The information from companies looks extremely interesting. We have talked to Complete Genomics [and] I think if PacBio and VisiGen [which is now part of Life Technologies] come out with new instruments, which drop the sequencing costs another order of magnitude, we will be the first ones to try and take advantage.
For us, sequencing is a commodity. We are not married to any particular sequencing approach. We want to get as much as information as effectively as possible to feed our models, to make the models as predictive as possible.
You are a co-founder of a company — Alacris Pharmaceuticals — that wants to take advantage of the new sequencing technologies. What is the company's goal?
We are trying to commercialize the individualized medicine aspect, simply because this cannot be done on a research basis. This requires a significant infrastructure, which could not be built up by a research institute. To really do that well — to potentially optimize the therapy for five million new cancer patients per year — you need a commercial system. Commercial systems can work at low profit margins to make it more accessible to more people more quickly.
Is Alacris already operative?
We are just in the last stages of getting it operative. It's now been in existence since last fall. We are completing the part in which founders and scientific advisory boards have to formally be put into the company, and we are talking to many people about getting funding.
We can start with a small amount of money and rely on organic growth funded by fees from patients or insurance companies, but the more money we have available to build up the infrastructure rapidly, the more rapidly we can address the clinical needs of the millions of people who end up getting cancer.