Name: Samuel Levy
Position: Director of Genomic Sciences, Scripps Genomic Medicine Program, Scripps Health, since 2009
Experience and Education:
Senior scientist, later director and professor of human genomics, J. Craig Venter Institute, Rockville, Md., 2002 to 2009
Senior scientist and lead, informatics research, Celera, 1999 to 2002
Research associate, later senior research associate; Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, 1989 to 1999
Postdoc, École Normale Supérieure, Paris, 1986-1989
PhD, Department of Botany/Zoology, University of Bristol, UK, 1986
BSc in biophysics, University of Leeds, UK, 1982
Sam Levy recently joined the genomic medicine program at Scripps Health as director of genome sciences, where he will direct human genome sequencing efforts at the Scripps Translational Science Institute.
Previously, he spent seven years at the J. Craig Venter Institute, most recently as director and professor of human genomics. Two years ago, he was the lead author on a paper in PLoS Biology describing the genome of Craig Venter, the first published genome of a named individual.
In Sequence spoke with Levy last week about his new job and his plans for human genome sequencing studies. Following is an edited version of the conversation.
When did you start your new job as director of genomic sciences at Scripps Health, and how big is the genomic medicine program? What attracted you to this position, and to move from the East Coast to California?
I started on Sept. 21, and the Scripps genomic medicine group is about 50 persons strong, which includes administrative staff as well as the lab and scientists and senior researchers.
The change was rather a whirlwind change — Eric Topol [the director of STSI] mentioned to me in June that he was interested in trying to have the genomics position filled, and asked whether I knew anyone. I thought that certainly my background, and given the interactions I have had with the institute, would put me in a good position to work at Scripps with this group. I think he was very pleased by that, as was I. [Also], the kind of research done here is certainly moving in the direction that I was hoping that I would work in for the long-term future — enabling a bit more genomics and clinical care, as well as doing all the basic research that I know and love doing, as I have done at the J. Craig Venter Institute for the last seven years.
What kinds of research will you be doing at Scripps?
The kind of research I will be contributing to will certainly involve a lot more direct use of genome sequencing in trying to understand disease etiology. The hope is that there would be both a research component, by which we would work with patient populations, and trying to understand how genetic variants contribute to their particular disease phenotypes. Having the Scripps healthcare system as one of the groups with whom we work — the other one is the Scripps Research Institute — we actually have large numbers of patient populations that potentially could help these disease studies.
The other component of it is to use existing evidence, whether it's evidence generated by our group here or by other groups worldwide, that have a genomic flavor to them. If they help in any way to understand how clinical care should be enacted, then we should employ those. For example, mutations in the CYP2C19 gene have been recently shown in more detail to influence the ability to metabolize Plavix. Those kinds of data can be used directly in a clinical setting and enable us to guide patient care in a very direct fashion.
[ pagebreak ]
Can you give a few examples of sequencing studies that could help you better understand disease etiology?
It’s a very broad-based approach. The example I just mentioned was in cardiovascular disease. We are also working on cancer — we have a developing cancer program that predated my arrival that attempts to sequence primary tumors of patients that are in clinics currently through the Scripps healthcare system. And the hope is to try and understand whether whole-genome sequencing can be employed in a fashion to guide our understanding of the kinds of genetic changes in cancer on the one hand — that's clearly a more research-based component — but also, on the other, to see if particular drugs that currently are available would be applicable. That kind of study shows the dual nature of our approach. One is research, and trying to understand genetic variation in tumors. On the other hand, [it also encompasses] the ability to apply some of that knowledge to guide the kind of drug therapy.
A colleague, Sarah Murray [director of genetics], is working on a breast cancer susceptibility study, trying to screen individuals for predisposing mutations using genotyping approaches. I think there are other studies that we imagine will employ genome sequencing approaches that will develop over time.
Clearly, a third component [of my research] is how I got to know the people at Scripps [while working at JCVI] — trying to understand what the limitations of current sequencing platforms are, and how best to apply them for sequencing human populations.
One of the other pieces of work we started doing with them was to understand the genetics of healthy aging and extended health span of individuals, and we will certainly be continuing that work. Eric Topol's group has collected this wonderful population of elderly individuals who have never had a major instance of disease in their lives. The idea there is trying to understand how any genetic variants they have might contribute not just to longevity but also to healthy aging. It's a different approach from many disease-oriented studies, where your primary population is a particular disease group. Here, we are looking at a healthy phenotype.
What scale are the sequencing projects you are planning?
I think the approach is very much tailored to the kind of study. It's appropriate to look in large populations — hundreds, maybe thousands of samples — if we have an understanding of what we are targeting, which genes we are interested in. We can essentially target subsets of genes, and therefore target a larger number of patients as a consequence.
The other extreme of the spectrum is looking, as groups are doing now, at either rare diseases or monogenic diseases, where a few individuals in either families or small populations possess well-defined phenotypes. There, sequencing genomes or sequencing exomes might be the approach to use. And I think we are going to, as we proceed with studies, [look] across this wide spectrum.
We are doing exome-based approaches now for some of the studies I just mentioned to you, and we are also trying whole-genome-based approaches. But clearly, with whole-genome-based approaches, we are not looking at hundreds of sample but tens of samples. We are working with Complete Genomics, for example, to try and understand some of the genome sequences that we can achieve with their methodology to help us understand some of the phenotypes that we are looking at in the healthy aging population.
How are you equipped with sequencing instrumentation at Scripps?
We have the Illumina and the ABI SOLiD platforms currently in house, one of each for the moment. A lot is dependent on the throughput that we need, and certainly, the acquisition of the SOLiD machine was under the understanding that the single Illumina machine was insufficient. We are going to evaluate, over the next year or so, to what extent we need to expand that infrastructure.
As well as the pre-existing technologies for sequencing, we are evaluating Complete Genomics, and also Pacific Biosciences is under our evaluation process. Even though we are a small group, not a genomics institute, we are trying to enable a better understanding of how any of these approaches would be applicable for the studies that we have.
We also have a compute infrastructure that leverages some of our more basic biology collaborations with the Scripps Research Institute. That involves a lot of numerical analysis of data with my colleague Nick Schork, who has a lot of experience in computational biology. He has ongoing collaborations in analyzing either genotyping or other complex datasets that involve phenotypes. Through that, we have access to a very large compute grid of over 3,000 nodes, and that has been the methodology by which we have been processing lots of sequencing data to date. We have also identified the need to build up a more dedicated compute grid in house, and that's actually one of the things I'm tasked to do. We are working to develop what that would look like further. This is going to be very tailored to our needs; it's not going to be a huge compute grid that one would have in a genome center. It has to be tightly integrated into what we are trying to achieve for the sequencing, and also, what we are trying to achieve in terms of data analysis. Depending on some sequence datasets, we don't need to have very large compute infrastructure. If you use Complete Genomics, some of that is essentially mitigated by their approaches, and their compute grid.
[ pagebreak ]
How much will you be outsourcing to companies like Complete Genomics?
Clearly, anybody using a whole-genome sequencing service might have different goals in mind as to how they would use the sequence. What we are trying to do with them is very much tailored to understanding, initially, what we will receive from them, and how that fits into our project goals. If it fits in well, we will use them further. It's just a question of trying to see where we are first.
What are the barriers for the wide-scale adoption of genomic analysis in the clinic, and what do you see as promising clinical applications?
There are two kinds of barriers I see. One is the inability to establish a link between the kinds of genotypes that each of us has in relation to our disease susceptibility and how those changes affect disease progression. That, clearly, is a large topic, and we certainly see ourselves contributing to this in the near future in terms of our ongoing research, using the kinds of populations that I mentioned to you.
The other kind of barrier is understanding the utility of genome sequencing. For example, even seeing in the cancer field the utility of sequencing a whole genome, even though currently we don't understand what every single rearrangement might do in terms of whether it contributes to tumorigenesis or whether it just goes along for the ride.
Clinical care, typically, tests for known outcomes, which is a perfectly reasonable approach. What we try with genomics, and where genomics is currently, is, we can accumulate large datasets. The utility of these datasets we are still getting to grips with, and it should not prevent us from generating these datasets for individuals in a clinical setting, whereby either now or sometime in the near future, those data can be used in a very direct fashion. Part of it is education, being able to explain to a clinician how much you will learn from doing a particular experiment if it involves sequencing either a set of genes or protein-coding genes or a whole genome, what they will learn from these different datasets. And then also, what will be on the research side, building of a large database of genetic variation that can be associated with phenotype. And over time, that will accumulate and the value of that will just increase.
Another thing I did not mention – some of our understanding of genetic changes has been primarily focused on the base pair changes in the genome. But clearly, the epigenetics of each of our cells is very important in understanding disease progression. We do envisage doing a lot more studies — some of them would be with colleagues at the Scripps Research Institute — which would involve understanding epigenetic changes during cancer, for example, and methylation, and the changes in chromosome architecture. Using these kinds of approaches that are not at this juncture ready for direct application in the clinic, we can start understanding, through basic research, how potentially they can be used for the long-term future. We have collaborators at UCSD, at the Salk Institute, and also at TSRI to potentially help us.
[ pagebreak ]
What did you focus on during your last few years at the Venter Institute?
There are several different areas of effort. One is sequencing the whole genome of Craig Venter. We have extended that work to use newer technologies to sequence his genome, to essentially understand how well these technologies perform, and what are their limitations. That has not been published yet; it's an ongoing piece of work.
Also, we were working with investigators who had been approved through the National Heart, Lung, and Blood Institute's resequencing and genotyping contract. They proposed a project, [and] we would meet with them through a teleconference and discuss with them the best possible way of doing their targeted sequencing project.
Over the last four years, we have worked with about 15 different groups around the country doing targeted sequencing using Sanger-based capillary electrophoresis, and identifying variants using in-house-built software. We have migrated that contract now, just prior to my departure, to use some of the new technologies to achieve targeted sequencing; I think currently the program uses 454, Illumina, and SOLiD. Which particular one is used at a particular time depends on the project goals. It sometimes may turn out that one sequencing approach may be more desirable over another. The other thing to bear in mind is that many of these sequencing approaches are targeted, so the issue is far more how you are going to enrich for the targeted genes or genomic regions, as much as which sequencing approach you are going to use.
Can you talk about what methods you have looked at for enriching?
Both for the NHLBI and separately at JCVI, we have used different approaches. We have used the RainDance technology — JCVI is one of the early-access partners for that — and also, we have tried the Agilent Sure Select technology. And both of those approaches work well. There are project designs for which one might be more appropriate, and other times, the other technology would be appropriate. Clearly, in this field of targeted enrichment, there are many other players. Part of that difficulty is, once you start working with a particular sequencing capture or target enrichment approach, you have to evaluate it. So you have to know, in your hands, what's going to work, and what's not going to work, and that evaluation process is not insignificant in time. Clearly, one can't just go ahead and try out as many approaches as we'd like. What we were trying to do at JCVI was to select approaches that were significantly different in amongst themselves so that we could estimate at least the selective advantage of another approach. So hence, [RainDance's] PCR-based approach on the one hand and [Agilent's] hybridization approach on the other were attempted.