This story was originally published June 17.
Name: Michael Snyder
Title: Professor and chair, department of genetics, Stanford University School of Medicine; director, Center of Genomics and Personalized Medicine, since 2009
Experience and Education:
Professor of and chair, department of molecular, cellular, and developmental biology, Yale University; director, Yale Center for Genomics and Proteomics; at Yale 1986-2009
Postdoctoral researcher with Ron Davis, Stanford University, 1982-86
PhD with Norman Davidson, California Institute of Technology, 1982
BA in chemistry and biology, University of Rochester, 1977
Last year, Mike Snyder left Yale University to head the department of genetics at the Stanford School of Medicine, and to direct a new Center for Genomics and Personalized Medicine at Stanford.
Officially opened earlier this month, the center's goal is to integrate genomic information with medicine, and large-scale genome sequencing will play an important role in this effort.
In Sequence spoke with Snyder recently about his plans for this new endeavor.
Tell us about the new Center of Genomics and Personalized Medicine.
The goal of this center is to have a serious presence in understanding genomes, and applying this knowledge to understanding human disease and improving health. That goes all the way from technology development to trying to take discoveries and translating them into the clinic, which is the ultimate goal.
Stanford, historically, has had some really great people in technology development and basic science. DNA microarrays were invented at Stanford. Stanford is also unique in that it has plenty of biology research, engineering, and a medical school and a hospital that are outstanding and all located very close to one another. The goal of this center is to put these parts together. We also are interacting extensively with the many biotechnology companies in the Bay Area.
One centerpiece of this center is going to be a significant DNA sequencing center that we are setting up that is sequencing human genomes with the latest technologies. We are also going to improve methods for mapping structural variations, which is probably the hardest part of human genome sequencing. We plan to apply our expertise in that area, along with just straight sequencing. The goal is to be a leader in this area.
We are going to have a pretty serious informatics component to this, too. It's very clear that the informatics is going to be the bottleneck, not the sequencing, and in fact, it could even become the most expensive part of genome sequencing. It's not that way right now, but it will probably head in that direction. In addition, there is a very strong group of computational biologists here at Stanford who are very eager to be part of the center.
We also want to be able to apply the results to human disease. There are obviously a lot of disease areas that people are interested in, but we expect to work in the areas of neurological diseases, like schizophrenia; asthma; inflammatory diseases; childhood diseases; and cancer. We believe the work will have significant impact in the areas of disease prediction, diagnosis, prognosis, and therapeutics.
Finally, we plan to integrate many other -omics activities into the center. These include epigenomics, transcriptomics, proteomics and metabolomics, and global analysis of immune responses. Analysis of genomes is really simply one component in a much broader picture, and integrating all of these activities is essential for comprehensive analysis of biological systems and disease.
We also want to make sure that integrating genome information into the clinic is done in a responsible and appropriate fashion. This is a very sensitive area — it's likely that with a genome sequence and a modest amount of additional information, you may be able to figure out exactly who it belongs to. So there will be a policy/clinical practice aspect to this, too.
The center is currently housed in temporary space, and there will be a whole new building in 2011 for the center and related activities. People who will be very intimately associated with the center are Ron Davis, Steve Quake, Russ Altman, and Carlos Bustamante, who we recently hired, and we have some other people who are very interested, like Atul Butte and Henry Greely. There will also be some new faculty hired as part of the center.
How is the center going to be equipped with sequencing technology, and why did you choose those particular platforms?
The first Illumina GAIIx machines were up and running in September. We currently have a HiSeq 2000, with five others on the way. We also have a Life Technologies SOLiD machine and are under discussion for more. A PacBio machine is coming in July. In total, there is room for about 35 machines. Illumina was chosen initially because of its easy sample preparation, which is particularly valuable for ChIP-seq and RNA-seq. All platforms are under consideration for genome sequencing because of both cost and accuracy.
Have you considered outsourcing human genome sequencing to Complete Genomics?
We are looking into outsourcing and have a number of samples at Complete Genomics. It may be necessary to do this because our ability to ramp up may not be able to hit our current demand. Nonetheless, we expect to maintain a very significant internal capacity so that we can analyze some genomes right away, and because of all of the genome analysis efforts, including ChIP-seq and RNA-seq.
[ pagebreak ]
What types of studies are you planning to conduct? Can you mention a specific example?
We are analyzing a number of samples from patients with disease right now, including those with cancer, asthma, cardiovascular defects, and childhood diseases with birth defects. We are integrating a number of different technologies to analyze these diseases, including genomics, transcriptomics, and proteomics.
We also have very large efforts as part of the [Encyclopedia of DNA Elements] project, mouse ENCODE, and modENCODE, which are primarily focused on mapping transcription factor binding sites in humans, the mouse, and worm.
How many genomes are you planning to sequence this year?
Probably on the order of 100 to 200 genomes. I expect that we will be sequencing and analyzing many hundreds next year, and ultimately, thousands.
Are you going to conduct mostly research studies or also clinical studies?
Both. First, we need to discover more so we know how to interpret genomes and underpinnings of diseases, and that's where the research side kicks in.
And then, ultimately, on the application side, we would obviously like to apply that to areas of disease prediction, diagnosis, prognosis, and therapeutics.
How is human genome sequencing most likely going to have an impact on medicine first?
Right now, its major impact will be in understanding disease, such as the recent example with Charcot-Marie-Tooth disease (IS 3/16/2010).
It should have the highest impact in "actionable genomics" — sequencing regions where a particular treatment can be prescribed, such as [human leukocyte antigen] or cytochrome P450. Initial attention will be placed on coding sequences, because those changes are most easily interpreted, but it will need to extend to regulatory regions, since it is likely that many diseases will affect those regions as well.
I can envision a future where everyone gets their genome sequenced and this becomes part of their health package, just like medical history, or a physical. Getting a genome sequenced may become a part of medical practice, just like getting an MRI or any other test, and used to help better understand a disease state.
Will it be necessary to integrate genome sequences with other types of molecular information? How is your center going to address that?
Absolutely. A genome sequence, although powerful and important, only provides a limited amount of information. To really diagnose or understand a disease state, it will be essential to monitor other features, such as the methylome, transcriptome, and, particularly, the proteome and metabolome. These latter features are especially close to indicating a cellular or disease state. Right now, when you get a blood test, approximately a dozen molecules are analyzed. This should really be thousands, or hundreds of thousands.
On the interpretation side, how are you planning to improve genotype-phenotype databases that some people say are inadequate at the moment?
One start is to understand natural human variation that occurs, as well as [variation in] affected individuals. The key to all of this is to have accurate and standardized measurements of phenotypes of large numbers of individuals so that correlations, even if weak, can be made.
Phenotyping will ultimately be the major cost of the human genome project, and molecular phenotypes are best, because they analyze thousands of features at once.
Other projects that will be extremely valuable will be research efforts to directly characterize the genome. We need to understand the genome so well that for every genomic variant — for example, SNP, indel, or structural variant — evident at birth, we can predict what its effect will be.
In what areas are you planning to develop new technology?
We'd all like to be more high throughput in sample preparation, in sensitivity, and in single-cell genomics and proteomics. Also, on the computational side, there is a lot to do. The amount of data that is going to come down the road will be so enormous — you have got to handle these data, having the best algorithms, and the most streamlined algorithms.
In how far will your center compete with existing genome centers at the Broad Institute, Baylor, or Washington University School of Medicine?
I think there is plenty to do. I don't see the point of going head-to-head with them, but I could see us working with them in certain ways. I can see us push the envelope in technology development, and coming up with new practices for integrating the results into medicine. We certainly plan to be in touch with the different centers in all these areas. We already work with many other groups in the 1000 Genomes Project and the ENCODE project. Hopefully, we would have some unique areas that we will push ahead, too, especially in technology development and on the computational side.
How are new sequencing technologies that provide longer, faster, or cheaper sequence reads going to impact the analysis of human genomes?
These will change things completely. Long reads will enable us to switch from mapping variants relative to a reference genome to de novo assembly. Mapping variants to a reference genome has limitations, particularly in complex regions of the genome with repeat regions and where extensive rearrangements occur. De novo assembly will be much more accurate.
As genome sequencing gets faster and cheaper, I think every sample that is potentially informative about a disease will be analyzed. If a human genome could get sequenced in 15 minutes for $100, why wouldn't it be done routinely and on the spot? Of course, interpreting the result right now is the huge bottleneck, so that needs to be overcome.