At A Glance
Director of Pharm GKB, Stanford University Senior Scientist, since 2000
Associate adjunct professor, UCSF department of pharmaceutical chemistry.
PhD, Medical Informatics, University of California, San Francisco, 1987.
Tell me a little about your background in this field and how you got involved in the PharmGKB project.
I got my PhD from UC San Francsico back in the 80’s in medical informatics, which was a degree that was focused on using computers in biology and medicine, and so you got training in computer science, math, medicine, diagnostics, and biology, and chemistry, which I had already had.
So I actually did my work in structure-based drug design, spent several years at UCSF, and in 1987 joined the faculty in the department of pharmaceutical chemistry. In 2000, I was recruited by Russ Altman, the principal investigator for the pharmacogenomics knowledge base, to come be the director [of PharmGKB], so I left my position at UCSF and came to Stanford.
Tell me a little about how you developed an interest in the field. What is exciting to you about it?
Pharmacogenomics, I think, represents the new way medicine will ultimately be delivered to individuals. The opportunities to work in a domain — because I’m not a doctor and don’t work with patients — that has a large impact on medicine and how medicine will change [were also attractive.]
What is the overall aim for developing the program?
This is an NIH-funded endeavor, which is led by NIGMS, although it does have multiple support from many of the institutes, and NIGMS saw the need for developing a public resource of PGx data [where anyone] — whether they were post-docs, graduate students, whatever — would have access to this large amount of data. This kind of data exists in pharmaceutical houses, but not in the public domain. And so this was meant as a publicly available research tool to help researchers understand how genetic variation among individuals contributes [to] the differences in their reactions to drugs. And so it’s meant as a central repository for clinical and genetic information, which is novel, because there are many genetic databases such as Gen Bank or DBSnp, and there are some phenotype databases, but it was a novel concept to be linking genotype and phenotype information. I should tell you that a really great resource to look for the correct vocabulary is on the website, which is www.pharmGKB.org.
How far have you and your group gotten in achieving your goals so far?
I think that in the last year and a half we’ve made remarkable strides — we are now at a stage where we have more than 30 genotype and phenotype data sets. There are a lot of barriers to developing this kind of resource. It’s not just technical barriers in terms of building a knowledge base. Those do exist, but there are the barriers of HIPAA, which played very heavily when this resource began. It was meant as a completely public open-access resource. Because pharmGKB is a part of this larger PGx research network called the PGRN, and there was resistance early on by some of the investigators, not because they didn’t want to share their data, but because we have to deal with multiple IRBs at multiple institutions, and to say “here, I’m putting it out as a public resource” — it’s a whole different ball of wax. … So there were barriers with regards to that. And HIPAA came along a few years ago, you know, you had to become HIPAA-compliant, and that, I think, helped our situation greatly, because it allowed us to put data at an individual level legitimately behind security.
Also, it just takes time to get the results in. So many of these groups, they were all funded about 4 years ago, give or take, and their experiments are just coming to fruition, so the genotype ones come up much quicker—you know you can genotype more quickly than you can do a clinical research trial—and so we’re just beginning to see some of that data, so I think we’re sitting well. The infrastructure for PharmGKB has really matured at a level that makes it stable from our perspective, and has become much more useable from the standpoint of the users … We’re moving in the same direction as the Protein DataBank — early on it was very difficult to get people to deposit their structural information, and early on it became the same sort of thing where if you published, the editors of journals required that you submitted your data, and the NIH required that you submitted your data to PDB, and I think we’re getting to the point where we’re reaching that kind of maturity, where we can start having those kinds of discussion with editors of journals. If you look at a growth curve, we’re just really starting to take off.
How do you foresee it serving the drug discovery community at large?
I’m going to separate the research side from the clinical side. The clinical side, I think, is many years away, to be frank. This is not designed to feed into your clinical info system at this time. Now, do I think in ten or 15 years, will this have an impact? I do. I think that’s what makes it exciting for those of us who fit in the interdisciplinary world of biology and medicine. Where we see it now is that we believe it will have impact in being able to provide an initiative or creative spark to the PGx community. On our front page we have a diagram for four different [kinds of] phenotype data that we think about: clinical outcome, pharmacodynamics and drug response, pharmacokinetics, and molecular and cellular functional assays. And then we have genotype data — and the idea is that by being able to put all the different data together at different levels, a researcher could come in and say ‘Huh, we have a lot of data, for example, [in] pharmacodynamics, and we have genotype, but there’s nothing in pharmacokinetics and there’s nothing in molecular and cellular functional assays. We ought to think about doing something in XYZ that will tell us something about it.’ And we expect it in many ways to be hypothesis-driven stuff, because it will allow them to think about the context. From a drug discovery side, I can look up and see what’s available on a particular drug, or if I’m on a biology side, what’s going on in a particular gene or a particular pathway.
What else is needed to make the resource more useful at this point?
Data. I think the more you get the better. The engagement of the community at large, and in particular the pharma companies and the academic institutions, because it needs to be their resource, and for a resource to become successful you need to engage your users.
There’s the proverbial “Whether you’re in academia or industry, its money,” and I think NIH has been very good with regards to that. I will say up front that the entire database is up for renewal — it is posted up on the website, there are due dates. It is an open competition, and we have a statement regarding the intentions of our renewal. We need structured vocabulary, so we’re in the realm of using USP DI, which will help our drug vocabulary, and HGNC for our gene vocabulary. Diseases are a little more nebulous, so for example right now we use MESH, and the question is, the database is really heavy in genetics and functional assays with some pharmacogenetics, and I think down the line as the data comes in that’s more clinical, we’ll have to use some more disease-oriented vocab. The other thing we are working towards, that it’s nice to say we now need, are analytical tools. We spent a lot of time building the infrastructure, and now that we have the data and infrastructure, we’ll be providing the analytical tools users need to make sense of the data.