Name: Dieter Wolf
Position: Director, NCI Cancer Center Proteomics Facility, Burnham Institute for Medical Research, 2007 to present; professor of signal transduction program, Burnham Institute, 2007 to present.
Background: Associate professor of molecular oncology, Harvard School of Public Health, 2003 to 2007; director, Harvard NIEHS Center for Environmental Health Proteomics Facility, Harvard School of Public Health, 2002 to 2007
In a study published March 9 in the online version of the journal Methods, a research team from the Burnham Institute for Medical Research describes an approach they developed for the proteomic profiling of fission yeast.
The method combines optimization of steps in an HPLC-tandem electrospray ionization mass spectrometry workflow that resulted in the identification of about 4,600 proteins, or 95 percent of the predicted 5,000 proteins in yeast.
The current study builds on work published in 2007 in which the team used a similar approach to identify about 30 percent of the yeast proteome.
While the method was used for analysis of fission yeast, the authors said that it also works with more complex organisms and tissue. ProteoMonitor spoke last week with the corresponding author, Dieter Wolf, about the study. Below is an edited version of the interview.
How is your approach novel or different from what other people are doing?
I just think we have tweaked it in a way that allows for the particular application that I was interested in, which is a complete profiling of [the proteome of] that particular yeast, fission yeast.
It's just tweaked such that it can do that.
It is a 2D separation. The first separation is offline, which is, of course, what people have been doing, but then we have worked with the company from which we bought the HPLC to really optimize the gradients and that has just given us a substantial number of proteins in an individual run. It's about 3,600 or so.
If you do that a couple of times, I think we get essentially complete proteomic coverage.
What were some of the tweaks you made in your approach?
It basically has to do with the chromatography. There wasn't any particular concept behind [the way the gradients were run]. It was sort of a trial-and-error procedure but it just worked out the way that we wanted.
Now does that mean it's the best possible [approach]? Maybe not, but it certainly is what has worked for us, and it would also work for us with even more complex samples from humans and mouse.
The proteins that you've identified are unmodified proteins, correct?
Our initial search was done with all possible modifications and then the reviewer wanted us to just look at a subset of modifications. … And we did it but the overall number of proteins does not vary very much. We are using the standard modification and then I think we also looked for ubiquitinated proteins.
We are using static cysteine carboxymethylation but that's standard in differential methionine oxidation, which is also standard. And then beyond that … we also do a second search where we put in other parameters — phosphorylation, ubiquitination mostly — but it does not affect the numbers very much.
The investigators can basically choose their own parameters. We like to work with false positives, grade for peptides at about 2 percent, but there is no hard standard for that. You can go lower for that, you can go to zero percent, or you could go higher.
[ pagebreak ]
Could you also put in parameters to look for PTMs, isoforms, or splice variants?
Of course, the database searches can be done with any set of parameters, but that's standard.
In terms of the abundance levels of these proteins you found, were you able to get past the higher-abundance proteins to the lower-abundance ones?
In 2007, we had a paper where we profiled about 30 percent of the fission yeast proteome and we used for quantification a method that is called spectral sampling, which has its drawbacks. It's very well accepted but it clearly has limitations in the low-abundance range, but what it allows you to do is semi-quantitation of proteins within a sample.
So I think we are now fairly confident that enolase, for example, is the most abundant protein in this particular organism.
Now on the low-abundance end, it's different. We cannot say with certainty what is the lowest abundant protein in fission yeast, but since we have identified 4,600 of the 5,000 predicted proteins, we are sampling across the entire abundance range. … Now would we be able to, if we compared a wild type and a mutant — think of that as a normal versus a diseased — compare differences in expression levels at the very low-abundance range? Probably not, but neither could any other technique that's out there so far.
I mean even the isotope-labeling techniques, iTRAQ and SILAC and so on, as long as the quantification relies on one or two peptides, it's just not going to be very accurate.
Is the work described in this paper a refinement, then, of the 2007 study?
Yes, in many ways.
We were limited at the time in the sensitivity of the mass spectrometry, and I guess that was the decisive factor. In the 2007 paper, we did an extensive data analysis that we did not include this time. We also compared messenger RNA abundance to protein abundance and did pathway analysis and did a comparison between fission yeast and budding yeast, and there were a lot of interesting points in that paper.
Of course, we could have redone [that analysis] with what we think is about 95 percent of the proteome, but we didn't. It takes a lot of time and the essential points we made in the 2007 paper [are] not going to change very much.
The correlation between protein and messenger RNA is going to come in at around .6 as it does for other organisms, although at the time, ours was the most extensive comparison between protein and messenger RNA, simply because we had the most protein. If you compare that now for all of them, I don't think it's going to change very much.
In terms of significance, the way I look at this is when you think back to the genome sequencing era, the first [eukaryotic] genome that was sequenced, I think was budding yeast, and they published it in Nature. And then the second genome comes along and it's published somewhere else.
And I think that is similar to what has happened in this case — that Matthias Mann had a paper in October in Nature showing the same coverage that we have for the budding yeast proteome, and ours is sort of the second finished proteome of a eukaryotic organism, and that is why we are fairly satisfied with the study even though in terms of scientific functional knowledge, it's not overwhelming. … It's sort of a framework from which we can go on now. Now we can do a lot of interesting things.
The value for me now is that I can profile all kinds of different mutants versus wild-type. I can basically use this in a similar way as messenger RNA profiling has been used [but] at the protein level.
We have already a study underway that's halfway done looking at the oxidative stress response in fission yeast where we're also comparing messenger RNA expression to protein expression and we are already seeing groups of proteins that are changed upon oxidative stress where the messenger RNA level just doesn't change.
So these things have not been seen before, and that's why we wanted to establish that. We also want to use it to look at global protein stability and things like that, potentially protein synthesis [on] a global scale.
[ pagebreak ]
You said the 2007 study was able to map out about 30 percent of the proteome. How have you been able to raise it to 95 percent with this study?
We switched from a [Thermo Fisher Scientific] LCQ instrument to an LTQ Orbitrap, and it's a quantum leap of difference.
In terms of sample preparation, we used exactly what we had published [in 2007]. … In 2007, [for] our first-dimension fractionation, I think we used five different methods from 2D liquid chromatography to isoelectric-focusing to 1D PAGE separation, and some other isoelectric-focusing in-solution methods, just to compare things and figure out which is best.
In the end, we threw all of the data together in order to make it to 30 percent [coverage]. Nowadays, all we do is a 2D liquid chromatography step and no other pre-fractionation. In the first paper, I think we had … altogether something like 50 different fractions in a process that took months. Now we just do a 1D run, take 24 fractions, and then re-inject each one for three hours.
It still takes [a] substantial amount of time.
How much are we talking about?
If you do the first dimension separation into 24 fractions … run them on a three-hour gradient, we're talking about 72 to 96 hours per sample for one run. But in order to get the coverage we were getting, we need approximately three runs.
So we're talking about hundreds of hours.
Yes, that is where the bottleneck is, and to my mind there is only one way at the moment … to get around that bottleneck. And that is just to get more instruments.
In the long run, who knows how we are going to do it in five or 10 years, but for now, in order to make it a massive effort, the easiest solution would be, instead of one Orbitrap, just run five or 10 in parallel. The problem is that they are so expensive.
Has the reproducibility of this method been proven?
Yes … of course we have run samples repeatedly. We have technical replicants, but everybody knows that on an ion trap instrument, the reproducibility of a technical replicant is not going to be more than about 80 percent and that is what it is in our case, and it has to do with the random nature of selecting ions for fragmentation.
In terms of biological replicants … let's just assume we have 100 percent coverage of the proteome in what we do, and if we do it again we have … let's say 95 percent coverage, then reproducibility, of course, is going to be very good.
The issue when we had 30 percent coverage was: Would we pick up the same 30 percent of the proteome each time we do the analysis? No, that was not the case. But now that we are close to 100 percent, we see the same proteins each time we do the analysis.
Now, in terms of the quantification, how reliable that is, that's too early for us to say, because we are now doing the oxidative stress response. We have two biological replicants, we have messenger RNA data so we know the experiment as such was very reproducible, and now it's going be interesting to see how reproducible the proteomic analysis will be.
I have actually no reason to doubt that it will be as good as the messenger RNA analysis.
The paper says that the remaining 400 proteins your method did not identify are expressed only in the yeast's mating state.
That's the assumption. … Before making that statement we should have taken those 400 proteins and looked at [whether] they are the ones, or are they enriched in the ones that are induced during mating. I haven't done so.
[ pagebreak ]
That's just more of a conceptual assumption. There will be no one state of a cell or an organism where the entire predicted proteome is going to be expressed. If you use a liver cell versus a heart cell and they each have the [same] 25,000 genes … they are not going to have the same 25,000 proteins.
If you think about coverage, we need to get away from putting down as the reference the predicted number of proteins. If you see 10,000 proteins or so in a myocyte, that may well be 90 percent of what's in that cell, but in terms of theoretical coverage, it's going to be less than 50 percent.
You said that your approach is suitable for other more complicated organisms. Have you used it on other organisms? Your paper mentions it's suitable for mouse cells.
Well, any complex eukaryotic proteome can be tried. The difficulty is the dynamic range, but that's always the difficulty. …And then [the dynamic range is] certainly much more in human serum or in a mouse serum sample, but plasma is probably the most difficult sample to work with in terms of covering its protein content, but that's well established.
We are now planning a study in embryonic stem cells as they undergo differentiation. I think that should be quite interesting. And then we have a number of collaborations within the institute from people giving us samples from a variety of sources, wild-type and knockout mice.
We are routinely seeing between 8,000 and 10,000 proteins … in mouse and human.
Are any of these new proteins?
I have not looked at the data so I don't know, but the thing is that you really cannot identify new proteins with that approach because … the way you identify the peptide is by looking into a genome database to see whether there is a match, so we rely heavily on the annotation of the genome, [and] if a gene isn't annotated as being expressed into proteins, you're not going to see it.
I know that people have sequenced the genome [for fission yeast] and they have done gene predictions and designated about 40 open reading frames as putative open reading frames, or as pseudogenes, and in quite a few cases … maybe in half of the cases, we have obtained clear evidence in the mass spectrometer that that gene is expressed into protein.
That is obviously very helpful to annotate some of these dubious open reading frames, as they are called.
Are you still refining this method or is it finished?
The method is ready for us. … We are working on two things [now] — to get phosphoproteomics going and that is already looking very good. But that is also something in theory pretty well established. It just needs to be tweaked.
The other thing we are working on is better ways for quantification that do not rely on stable isotope labeling, just because it's impossible to do on a human. We cannot feed a human for a year isotopically labeled arginine and then take a sample.
What we would do is more of a bioinformatics approach to use the same metric that people are using for relative quantification by isotope labeling, which is the peak area intensity of the ion basically, and then compare that across samples.
So if you have identified a peptide in two samples, then we're going to go into the ion chromatogram to see what the peak area intensity was and whether there was a difference. …We have written a software program called QPik and we are now evaluating it in a real-world setting with the oxidative stress sample.
That would then allow us to basically run any dataset we have through that software and then do a relative quantification across different samples.