Name: Alex Parker
Position: Principal scientist, Molecular Sciences group, Amgen, Cambridge, Mass., since 2006
Experience and Education:
Associate director (and other positions), Millennium Pharmaceuticals, 1997-2006
PhD in genetics, University of Maine, 1997 (project on population genetics of accelerated speciation in fish species flocks found in the East African Rift Lakes and the Bolivian Altiplano)
As a principal scientist in the Molecular Sciences group at Amgen's Cambridge, Mass., site, Alex Parker has been using 454's sequencing platform for more than a year, primarily to look for mutations in tumor samples from patients enrolled in Amgen's clinical trials.
Last week, Parker spoke with In Sequence about the role of next-generation sequencing in his work, and where he sees a place for emerging sequencing technologies at Amgen. Following is an edited version of the conversation.
Can you give a brief introduction to your work, and how your group fits into Amgen?
My group is part of the Molecular Sciences group at Amgen. Molecular Sciences as a whole is responsible for all biomarker discovery, validation, and deployment activities within clinical development at Amgen. My group, specifically, is responsible for genetics within that larger group, which also does many other things like proteomics and gene expression analysis.
My group's role is, really, to understand the influence of genetics on patient response in clinical trials, that's the core mission. The bulk of our effort, at least in the last couple of years, has been in support of oncology drug development, but we do have involvement in all the therapeutic areas at Amgen.
When did you start using so-called next-generation DNA sequencing technologies? How have you applied them in your work?
We formed the intention to begin using next-generation sequencing early in 2008 and did a review of the platforms that were available then — 454 and Illumina, which had just reached market at that point.
We got our 454 sequencer in November of 2008, so we have been actively using it for a year and a half now. We chose the 454 platform because at the time we were considering which sequencer to buy, the Illumina sequencer was brand-new and was only doing 35-base-pair sequence reads, which we knew would be incompatible with the majority of the kinds of experiments we wanted to do. The majority of the work that we have done has involved mutation screening of tumor samples — primarily archival pathology samples from patients who have been enrolled in our clinical trials in the past. To do this, we have almost exclusively used a PCR-based approach up until now to target the genes and capture the sequence. We knew that for sequencing PCR amplicons, we required a minimum read length of about 200 base pairs to do it efficiently, and that was the driver in selecting the 454 platform at the time that we bought it.
What's the scale of your sequencing projects?
We have probably done up to 1,000 samples in the last year, and we certainly have many more samples promised, coming in both from completed clinical trials and trials that are in process. So we are actively working to improve the throughput to facilitate doing both a larger number of targets and a larger number of samples.
Right now, we typically are looking at experiments targeting between 50 and maybe 200 genomic regions. Obviously, we would like to pursue some of the non-PCR strategies to take that up, ultimately, at least into thousands of regions. I'm still unconvinced that we necessarily would derive a whole lot of benefit from routinely sequencing entire exomes in patients, but I think that remains to be seen.
[ pagebreak ]
What target enrichment methods are you considering?
Unlike the situation two years ago, I think there are viable non-PCR methods for capturing sets of targeted sequences available, and we are definitely interested in those.
We have begun the process of evaluating solution-based hybridization methods — Agilent and NimbleGen have fairly analogous methods for attempting to capture the entire exome in solution. These are broad but not terribly specific methods, but we will evaluate those, and we will also evaluate the Olink technology, a sub-genome selection methodology that looks like it may be very promising for our applications.
What about new PCR-based enrichment methods?
We have worked fairly extensively with Fluidigm. Our current PCR strategy uses the Access Array technology to miniaturize the PCR. As far as PCR goes, it's about the most efficient approach we have been able to come up with.
What criteria do you use to evaluate new sequencing platforms, and when do you think you will be ready to switch over to a new instrument type?
Clearly, the 454 platform has a horizon associated with it — it's not going to be able to keep pace with the capacity of the newer technologies.
The primary criteria that I use are still read length, which I think most of the providers have now tackled fairly efficiently. We do want to be able to do long-enough reads that are unambiguously alignable to the genome, so we need 100-plus base pair reads. Also, I consider flexibility in terms of experimental design, and then cost, both capital costs to acquire the instrument, and cost per data point to acquire the data. Different machines have different strong points and weak points, and there is fairly stiff competition around the cost per experiment and cost per data point, as far as I can tell.
We will still keep using the 454 instrument, though — there are experiments that we have done on a less frequent basis where, if there is the promised increase in read length that 454 supposedly will be coming out with later this year, the data would be valuable to us.
By the beginning of 2011, I would like to be doing the high-volume targeted sequencing on a platform that is going to be able to generate more data in a certain time for us. Whether that winds up being one of the Illumina machines — there seems to be increasing diversity of flavors of the Illumina sequencer on the market — I don't know. It could very well also wind up being one of the Ion Torrent machines, assuming that that really does come to market the way they have promised. It's still just a promise, but it's a compelling story. I think the likelihood of there being a Pacific Biosciences machine at Amgen anytime soon is pretty small. I don't think there is anything wrong with the company, but I think given the combination of size, cost, and what its strong points are from the experimental basis, I don't think it would be a fit for us.
How could the very long-read sequencing technologies that are currently under development be of use to you?
I'm aware of a couple of technologies that are either in existence or on their way that hold promise to provide many-kilobase reads. There may be others that are more hypothetical, or more at the development stage right now, like nanopore technologies. But in general, there is the PacBio, which in principle, if they can get a few things worked out, could do 15-, 20-, 30-kb reads, potentially. There is this new Life Technologies approach with the quantum dot-coupled polymerases. If their claims are correct, in theory, it could also produce very long sequences.
One can certainly imagine applications where that would be valuable scientifically. Folks who are interested in studying HIV, for instance, might be really interested in being able to read out the full viral genome sequence from single viral genomes, to find mutation types present in a patient sample, for instance. HLA typing, obviously, is better the longer the read you can do. There is the 454 platform now, but even longer reads would give you even higher-resolution haplotyping. Applications like that, I think, could really be a niche for that kind of sequencing technology.
I don't see it helping me with my primary goals right now. Very long reads wouldn't be a driver for me in thinking about acquiring a new technology. Highly parallel rather than really long reads, that's the driver for me.
[ pagebreak ]
In general, how is large-scale sequencing being used at Amgen?
Historically, there has been some moderately large-scale Sanger sequencing just for the purposes of verifying DNA constructs, clones, antibody coding sequences, things like that. There has very recently been the purchase of an Illumina instrument [at our core facility in California]. I haven't heard yet whether it's actually up and running. The goal is to support the discovery organization with, primarily, digital gene expression experiments.
Would you consider outsourcing large-scale sequencing to a service provider — and if so, under what circumstances?
If researchers at Amgen in general, for instance, wanted a large amount of digital gene expression data generated — if they had made many cDNA libraries that they wanted sequenced — it might make sense to outsource to a company that has significant platform size and number of Illumina sequencers to generate that data rapidly, rather than trying to do it serially, over a long period of time, on our instrument or the Illumina instrument in our core sequencing facility. That would be a context where outsourcing to a large-scale provider would make sense. There are a number of companies, including Agencourt and SeqWright, that have fairly large next-gen sequencing laboratories as a service business.
Regarding Complete Genomics, I don't know. I'm not sure when or if Amgen would really be in the mood to do an experiment that involved sequencing complete genomes of large numbers of individuals. I'm not saying it won't come to pass, but there has not been serious discussion of doing that kind of an experiment yet.
For what reason? Is it still too expensive, or would all that data not be useful?
I think it's expensive, especially in the context of experiments designed to determine the kinds of things that Amgen might be interested in, which would be, for instance, new drug targets in the oncology space. My feeling is, you would really need to sequence quite a few genomes to do that, and that would be quite expensive. It's also the case that if you are interested in data of that sort, and mining it for new target opportunities, you can just sit and wait for the Broad Institute to generate those data, put them online, and then you get them for free. You get them for free the same time the rest of the world does, but they are free.
We have had casual discussions about other types of studies that have very recently been done. If you have a family that has a rare Mendelian trait that you believe is an interesting model of a disease you would like to be able to treat, it has been demonstrated that if you have the right set of individuals, and you truly have a single-gene disorder, complete genome sequencing can be a pretty rapid way to identify that gene with a fairly small number of affected individuals. But again, we are not aware of any kindreds that would fit that bill and would represent a model that we want to understand. So right now, that's very much a hypothetical kind of experiment that one could do, but we don't have any serious plans.
For your own work, do you think at any time in the future you might want whole-genome sequences rather than targeted sequences to characterize cancer genomes?
It's certainly possible. In theory, if there is no associated cost, having more data is always better than having less data. The thing that remains to be seen, I think, is what the incremental benefit is of going from a couple of dozen genes, like we are doing now; to a couple of hundred genes, using a different targeted technology; to an entire exome; to an entire genome. How much more do you learn?
You can pretty much sit down and figure out how the cost scales, both in terms of doing the experiment and all of the bioinformatics cost, the data storage cost, all the other direct and indirect expenses that you incur by scaling up the experiment in that way. You have got to look at that and then look at the trade-off with the additional benefit you get in terms of being able to do smarter, faster clinical trials, target drugs to a better patient population, et cetera.
I'm not convinced, at this point, that there is going to be enough incremental value in going past, perhaps, the few-hundred-genes level to make it worthwhile. At least not until the informatics part, in particular, becomes much less labor-intensive and much more automated. Because very large whole-genome sequencing studies and really mining all of the information that's available out of this data requires a huge informatics effort, which is not cheap.
You are also involved in the Women's Genome Health Study — a collaboration between Amgen and Brigham & Women's Hospital where you SNP-genotyped 28,000 women. Any plans to sequence their entire genomes?
Probably not. The SNP dataset that we generated for that project continues to bear fruit, both on its own and as part of a whole series of larger meta-analyses of groups of big genome-wide association studies. I'm not sure how much value would be added by doing sequencing on that very large cohort of individuals as well. It's certainly not something I would rule out categorically, but there are no plans right now.