At A Glance
Name: John Yates III
Position: Professor of Cell Biology, Scripps Research Institute, since 2000.
Background: Developed SEQUEST and Multidimensional Protein Identification Technology (MudPIT) for shotgun proteomics.
Associate professor, University of Washington, 1992-2000.
Post-doctoral fellow, California Institute of Technology, Leroy Hood’s lab, 1989-92.
PhD in chemistry, University of Virginia, Don Hunt’s lab, 1987.
BS in zoology, University of Maine, 1980.
Tell me how you came up with the MudPIT technique.
It all goes back to SEQUEST. It became very clear, with the ability to quickly identify proteins, that one could take advantage of the mixture analysis capability of tandem mass spectrometry to fashion a new way to identify proteins in mixtures. Some of the initial studies that we did using single dimension chromatography made it clear that we needed to get higher results in separations. I was always intrigued by multi-dimensional liquid chromatography and I started thinking about it in the context of the mass spectrometer. Most of the methods that had been published were rather complicated configurations for multi-dimensional chromatography, so the idea was to develop a method that would enable you to take advantage of the sensitivity of the mass spectrometer. So we came up with a way to do multi-dimensional chromatography using a biphasic column. We’ve been building on the method ever since — improving the sensiti-vity, getting better mass spectrometers every 24 months, and working out the informatics.
What are the advantages and disadvantages of top-down versus bottom-up approaches?
Top-down’s got some attractive features in that you’re potentially looking at the functional protein. The problems associated with it are that it’s not very straightforward to separate and enrich for every single protein of the cell. And even if you were able to get that far, the methods that people use to try to fragment these things are not particularly general. You can’t just take any old protein, stick it in and expect to get reliable information. [Also], this is mostly done on FT-MS and the problem with FT-MS is that it hasn’t been very easy to do those experiments. I think that’s improving, but it’s not like you could fashion a large-scale and high-throughput method centered on top-down at this point in time.
So the advantage of bottom-up is that you get high-throughput coverage of a larger variety of proteins?
Right. The tandem mass spectrometry process on peptides has been around for at least 20 years. So a lot of the mechanics and automation have been worked out, and it’s a generally reliable way to generate sequence information for peptides. There are limits to it, but in general most peptides will fragment and give you at least partial information if not complete information. The drawbacks are the complexity that you produce when you do the experiments — so when you digest the set of proteins you increase the number of peptides by a factor at least of 20 if not 50. And the informatics are pretty intensive to analyze all that data.
Are trends moving toward doing more top-down approaches?
I don’t know that that’s going to be the case, because [of] the instrumentation and expertise required to do top-down in it’s present form — it’s pretty expensive and the people need to be pretty sophisticated. I don’t know if you’ve ever seen one of these FT-MS — instruments they are quite a tangle of cables and wires and everything else. Most of the experiments that people are doing are being done on research-grade instruments, not commercial-grade instruments.
So you don’t think that anytime soon top-down is going to be more prevalent?
No. I think that the people who are pushing that method are doing what they’re supposed to be doing —they’re working through the problems. What they eventually have to do is convince the people who are actually trying to solve the problems that they can solve the problems [with top-down]. That’s probably going to take some good examples of having solved problems that they couldn’t have solved by an easier method.
Your lab often provides mass spec services to other labs’ projects …
We’ve got a grant from NCRR that enables us to collaborate with the yeast community. Generally my lab will collaborate with people who are studying particular problems that either I find interesting or challenging. Right now we’re interested in large structures of the cell.
What is your lab working on now?
We’re working on a lot of microorganism proteomics — mostly Plasmodium falciparum and other species of Plasmodium. We’re also working on mammalian proteomics, in particular doing quantitative proteomics in mammalian model systems like rats. We continue to do technology development including informatics and software. The one I’d like to give you some details on, I can’t at this point because it’s been submitted for publication, [but] it concerns quantitative proteomics in mammalian systems. It [was submitted to] a high-end journal.
Tell me about the subtractive proteomics paper that recently came out in Science (Science 5 Sept. 2003 Vol 301; 1380-2).
That was a nice example of developing approaches to solve a problem in proteomics that was a particularly tricky problem. The nuclear envelope is contiguous with the ER, so the question is, how do you analyze that and figure out what is really in the nuclear envelope? The solution was to isolate the microsomal membrane fraction — which was primarily ER — and analyze that, and we did that using about three times the quantity of nuclear envelope proteins so we could make sure we could pick up as many of the low abundance proteins as possible. We pulled out about 3,000 or so proteins. Then we analyzed an enriched nuclear envelope fraction and just subtracted out the proteins that were common between the two. That was probably the only way to get at it — you’ll never be able to isolate a pure fraction of the nuclear envelope free from ER.
What are you working on in bioinformatics?
We’re working on database-searching algorithms, statistical validation of data, [and] figuring out spectral quality prior to putting spectra into the process. We’ve also been working on the back end of the problem: How do you sort through all the information; how do you compare sets of information? We’ve got a paper accepted for publication on a nice method for quantitation of peptides.
What aspect of bioinformatics has the biggest need for improvement?
I’d probably have to say spectral quality. Most people — when [they] think about the problem superficially — think it’s an easy problem, but it’s actually a really tough problem. It’s pretty easy to tell when you’ve got a good spectrum and when you’ve got a really bad spectrum, but it’s this gray zone in the middle where it’s tough to tell when one of those spectra is likely to yield an answer from a database-searching process. We’re working on computer algorithms that will try to identify the features in spectra that lead to reliable identifications.
What instrumentation and techniques in proteomics still need major improvement?
In top-down proteomics, [we need to improve] the methods for performing high resolution mass spec for protein analysis, which is a real requirement — one needs to create an FT-MS that is reliable and easy to use. You need to have good efficient methods for fragmenting intact proteins, and there’s a method that shows a tremendous amount of promise called ECD [electron capture detector]. On bottom-up approaches, we need better methods for multi-dimensional liquid chromatography to get more resolution in a shorter period of time. In terms of mass spectrometers, we need mass specs that scan faster, have better mass resolution, better mass accuracy, more sensitivity, and better dynamic range. On the informatics side, we’d like things to go faster and be more accurate.
What about improvements in front end techniques?
What would be great would be for cell biologists to come up with better methods for fractionating components of the cell. What I think is becoming increasingly clear as proteomics techniques become better and better is that all these enrichment methods that cell biologists have been using for years have been found to be not very good at enriching components. That became pretty clear when I had a post-doc at the University of Washington who performed a nuclear preparation in yeast and he took an aliquot of that and digested it and ran it through the mass spec and discovered that the most abundant components of that standard nuclear preparation procedure [were] cytoplasmic proteins, not nuclear proteins. So a lot of these methods enrich suitably for using techniques like Western blots — they don’t enrich sufficiently for using the new proteomic methods.
What future plans does your lab have?
We’re just now starting a brain proteomics project with rat brains. We’re going to compare a population of membrane proteins that one observes in rats of different ages. There’s a large portion of technology and methodology development that’s going to go into this. That’s probably the highlight.