Assistant Professor, Duke Clinical Research Institute
Duke University Medical Center
Name: Keyur Patel
Position: Assistant professor in medicine, Duke Clinical Research Institute, Duke University Medical Center, 2005 to present
Background: Clinical associate in medicine, Duke Clinical Research Institute, Duke University Medical Center, 2003 to 2005; research associate, Duke Clinical Research Institute, 2002 to 2003
Name: Joseph Lucas
Position: Assistant professor of statistics, 2007 to present, Duke University
Background: Post-doc fellow, Duke University, 2007; PhD, statistics, Duke University, 2006
Name: J. Will Thompson
Position: Senior laboratory administrator, Duke Proteomics Core Facility, Duke University Medical Center, 2007 to present
Background: Principal scientist, chemical development-analytical sciences, GlaxoSmithKline, 2006 to 2007; principal scientist, disease and biomarker proteomics, GlaxoSmithKline, 2006.
Hepatitis C is the most common blood-borne viral infection in the US with up to an estimated 4 million people infected. Worldwide, 170 million people are estimated to have the chronic disease.
The standard treatment for it includes weekly injections of interferon combined with the oral antiviral agent ribavirin. However, only about 40 percent of patients with the most common subtype of the virus, genotype 1, respond to it, and there is currently no way to identify them.
At this year’s annual meeting of the American Association for the Study of Liver Disease, held in San Francisco earlier this month, researchers at Duke University presented preliminary findings on research into proteins that may be used to gain that information.
Assistant Professor, Statistics
They looked at the serum of 10 patients with genotype 1 hepatitis C who received treatment and responded well; 10 patients who did not respond; and 10 with either genotype 2 or genotype 3 hepatitis C who responded well to the treatment.
The researchers broke down the proteins in the serum into peptides and used liquid chromatography-mass spectrometry to sort the peptides. They then used factor modeling with Rosetta Elucidator software and discovered three factors representing clusters of proteins that can predict in nine out of 10 cases which patients will or will not respond to interferon/ribavirin.
The work is still in the early stages, and the researchers have not yet published their results.
ProteoMonitor spoke with three of the researchers recently in two interviews. Below is an edited and combined version of the conversations.
How much do we already know about who may respond to treatment and who may not?
KP: About 3 million [people] in this country — that’s a conservative estimate — have hepatitis C infection. Not everyone has been treated and not everyone has been diagnosed, certainly. Maybe 15 to 20 percent of people [who have it] have been diagnosed because not everyone needs treatment because it’s a slow natural progression of disease in a few patients.
But the ones who are picked for treatment [have] only two options for treatment basically. There’s a combination of interferon, which is a naturally occurring substance [that’s] been genetically engineered, and what it does is boost the immune system and causes a whole variety of side effects. It works through various mechanisms stimulating the immune system to fight the virus and get rid of the chronic virus from the liver.
J. Will Thompson
Senior Laboratory Administator,
Duke Proteomics Core Facility
Duke University Medical Center
And it’s combined with ribavirin, which is again an anti-viral. And the combination of the two does result in a number of side effects and they’re effective in only about 50 percent of patients. … Of those 50 percent who do respond, many of them have a significant impact on their quality of life during this one year of treatment.
So basically, our treatment options are limited for chronic hepatitis C patients.
Do we know why half of the population doesn’t respond to treatment?
KP: Many patients aren’t eligible because the side effects of interferon are multiple and there’s a whole number of people who aren’t treatment-eligible — people who have bad kidney functions or psychiatric disturbances. Interferon can precipitate psychiatric disturbances.
The ones who don’t respond, we don’t really … have good reasons why. People have tried to look at genetic make-ups, or tried to predict who’s not going to respond based on certain factors.
[For example] African-Americans only have a 20-percent response rate compared to Caucasians who have double the chance of responding, and no one really knows why. It may be due to some inherited interferon resistance.
There are multiple factors — if you’re older, you’re male, you have a higher degree [of non-response] or [if you have] scarring in your liver, you don’t respond as well. Or if you have a high viral load, you don’t respond as well.
So we have these soft demographic features, basically, which help you predict who’s going to respond, who’s not going to respond, but they’re not very accurate and they vary from population from population.
What have the genetic tests told us about who responds and who doesn’t?
KP: People have looked at this … and they’ve found that perhaps [it’s due to the] heterogeneity of the virus — the virus replicates at around 1012, or a trillion copies, so it’s got a very high turnover rate per day, so you get a lot of errors, you get a lot of mutations, if you like.
People have said the number of mutations may correspond with that, but it’s not really borne out in studies.
But people have looked at genetic studies and they’ve found that … the only consistent feature is that people have … interferon-stimulated genes, which are overexpressed at the outset, so basically your body is producing maximum interferons through these stimulated genes. As a result, if you get any more interferons, it doesn’t seem to do any further good.
That’s about as far as they’ve gotten, and people have looked at single nucleotide polymorphisms and stuff with various cytokines, and interferon gammas, and interferon genes, a number of things, but again, [the results] have not been very consistent and not really validated in other populations.
So maybe for a subset of populations, these things may hold true, but certainly not for the majority of patients.
It sounds like what you’re saying is that the results from these tests aren’t something that we can take action on.
KP: Right, certainly not clinically, that’s for sure.
Run down the proteomics work that you did.
WT: We started off … with 30 patients, each sample run in triplicate, and they were 3X randomized in the order in which they were run so there should be a limited amount of run-effects due to sample order.
The analysis was done using essentially a Waters platform, which is nano-Acquity LCs and Q-TOF Premier running in the MSE mode, so alternate scanning, high-low energy. We’ve run this in … a two-hour gradient, two-and-a-half hour cycle time between samples, so on the order of 12 days of continuous data acquisition for this particular instrument.
The data is processed in two different ways. We used Waters PLGS 2.3 … for qualitative identifications. And then for quantitation, the raw data was brought into Rosetta Elucidator, which was used to do the accurate mass and retention time alignment of the datasets of the analyses across all 90 runs.
Overall for this study, we have used what [Rosetta Biosoftware] calls ProteinTeller and PeptideTeller … to filter all of the identifications across those 90 runs. … Out of that we’ve identified from these particular single dimension LC-MS/MS runs on the order of 1,083 peptides in the first-pass analysis. And there are approximately 190 proteins. That includes one-hit wonders. I think 90 of them were one-hit wonders.
The key part of this is not based on number of identifications, but is based on the quantitative aspect because the statistical analysis was not performed on only the identified peaks.
There was right around 60,000 isotope groups … so the analysis downstream, the statistical analysis, is based on 60,000 of these isotope groups quantitated across all 90 analyses.
So bringing in the raw data, generating the images, and then the alignment of these LC-MS/MS and then the isotope group quantitation was all done within Elucidator. Elucidator has some very nice export functions so we can generate the quantitative data in a tabular format and pass it off to other statistical tools.
The standard statistical approach for looking at this type of differential proteomics data uses hypothesis testing-type approaches where you say, ‘This isotope in and of itself, how much does it change and what is the p-value for that change?’
And it treats everything individually.
And one of the really smart things that Joe Lucas brought to the table was this idea that we shouldn’t expect all these things to change … first, by much necessarily, and secondly, we shouldn’t expect them to change individually. They’re going to change as a group.
We’ve got multiple peptides coming from the same protein. We’ve got multiple proteins in the same pathway, so we should expect to see groups of things going up and down in concert and that should be more important than saying, ‘What is the single most indicative isotope group or the 20 most indicative isotope groups?’
So that is really a paradigm shift in the way we’re looking at it, and [Joe Lucas] brought that over from his work on microarrays.
KP: A French study looked at using SELDI-TOF and they came up with, I think, six peaks and if you added in the amount of scarring in the genotype, they came up with an accuracy of about 80 percent [trying to predict] who responds and who doesn’t respond.
They couldn’t tell you any more about the peaks, so we wanted to go with a much more difficult technique in terms of time, using the … gel-free, label-free type of platforms to try to really nail down which proteins and peptides [can be identified] and link them to a biological pathway to see if we can really work out … if there is a protein signature baseline that predicts responses to standard-care therapies.
You can come up with a series of peaks and say [there’s] this response, but there’s really no biological plausibility to it, so we really wanted to use this platform to uncover new pathways.
JL: There were [about] 100,000 features which were collected into isotope groups. So there were about 60,000 of those, and essentially what we did was group the ones that co-expressed under the assumption that they are either from the same protein or from proteins that are in the same biological pathways. By doing that, you can reduce the dimension quite a bit and you end up with collections of isotope groups that when you look at them … it’s very clear that …across the samples, they do the same thing. They go up and down together.
From collections like that, we can build vectors that describe that variation, and it’s those vectors that we used to do the predictions, to build the models.
How many proteins were you able to identify?
JL: Fifty-four [proteins and] … 236 of the peptides out of 63,000.
KP: The identities are still in progress. We’ve only identified 5 percent of these species basically.
Are these proteins that you’ve identified so far the ones that you think may be implicated in drug resistance?
JL: Not necessarily. The identifications require a lot of the proteins to be in the sample. Those are just the ones that were abundant in the sample. Some of them show up in the predictive signatures, and some of them do not.
The appealing thing to me about the unidentified peaks is that they may correspond to things that are not in the database. They may correspond to modifications of proteins that are in the database but wouldn’t be identified because they’re modified in some way or another.
We don’t have to know what they are in order to use them to do these predictions.
Do you have any information yet about their pathways?
JL: We know that the few that have been identified and are being used by the model to do this prediction … have relationships to liver functions and such. We have some confidence from the biology that probably what we’re doing is right. But that’s about it. We don’t have any sort of real, clear [information].
What about interactions? Have you gotten to the point where you’ve identified interactions between these promising proteins?
KP: We’re using Ingenuity Pathway tools to help us determine these interactions. We’ve only identified very few proteins. The interactions that we’re seeing at the moment are the usual suspects … these aren’t surprising. They’re established in hepatitis C.
The key thing … was basically we came up with three factors that probably represent many hundreds of peptides, if not thousands, and with the model we built, if you had a positive result, you could predict accurately 30 out of 33 patients who [will respond to therapy and who will not] which is probably the best thing we’ve seen in terms of predicting responses at the outset before anyone received a drop of medicine.
And we’ve verified this in our second cohort of 27 patients. We’re getting good sensitivity and specificity in the performance of this model, but identification and biological pathway links are still in progress.
As you’re moving ahead, what kind of tweaks are you making to this proteomics platform?
WT: The traditional biomarker discovery paradigm is that you start off with a small discovery cohort, and you do it using an open platform, and then you move to an MRM-type analysis and then preferably to an antibody assay platform for validation.
Although we came into this thinking along that paradigm, some things that are challenging that paradigm have come out of this. The first is that in order to make an accurate and predictive model, I believe the number is that 650 of these isotope groups were used in order to make a statistical analysis, so it’s still a relatively small subset of 60,000. But it might be more than it would be feasible to go after using an MRM-type platform.
Depending on the numbers of these … that we get identifications for, [that] will then affect our ability to look at an MRM or an ELISA platform for further validation.
Currently, our focus has been shifted a little bit because of the number of analytes that we need to track. MRM is great but ordering 650 stable, labeled peptides and having a mass spectrometer that can even track 1,300 peptides and 2,600 transitions or more in a single analysis is still quite cutting edge. I’m not sure there’s an instrument out there that can do that.
JL: Some of the improvements are going to have to do with the pre-processing of the plasma. We are going back and forth about whether we’re going to use a targeted system, but it’s not clear whether we’re going to do that or not yet.
Your approach was able to predict in nine out of 10 cases those patients who will respond to interferon/ribavirin treatment and those who won’t. Any insight into what’s going on in the one case where the proteins or peptides don’t have predictive value?
JL: It’s not actually clear that they don’t have predictive value. It’s just that issues with aligning the mass-spec samples and issues with noise in the system may be hiding their signals. …It’s difficult to tell whether it’s technical noise or biological noise.
KP: It’s about as good as you get for a biological outcome. You’re talking about taking a group of patients, and trying to predict who’s going to respond, who’s not going to respond to treatment.