At A Glance
Position: Professor, Department of Chemistry and Biochemistry, University of Colorado at Boulder (faculty since 1992)
Associate Investigator, Howard Hughes Medical Institute (started 1994)
Background: Research Assistant Professor, University of Washington, 1990-1992
Postdoctoral Fellow, University of Washington, 1985-1990 (worked with Edwin Krebs, Christoph de Haën)
PhD in Chemistry, University of California, Berkeley, 1985 (worked with Judith Klinman)
BS in Chemistry, University of Washington, 1979
How did you get involved with proteomics?
I was trained as an enzymologist. During my time as a postdoc at the University of Washington in Seattle, I met Ken Walsh, who was one of the first people using the power of mass spectrometry in protein applications. It was very clear to me that mass spec was going to be one of the best approaches to monitoring chemical modifications. At that time, I was working in Ed Krebs’ lab — I was one of the discoverers of the MAP kinase pathway.
Then I moved to Colorado in 1992 as assistant professor [at the University of Colorado in Boulder]. Katheryn Resing, who trained with Ken in Seattle, also moved there; she was the first person to analyze the phosphorylation sites of a protein called profilaggrin, using classical radiolabeling. In Colorado, we decided ‘let’s apply mass spectrometry to signaling,’ which we have been doing ever since.
When we first decided to get a mass spec in 1992, half the faculty said, “what for?” In 1994, when I was appointed to Howard Hughes, we received a PE Sciex API3 triple quadrupole mass spectrometer. It was the first mass spec for biomolecular research at the Boulder campus.
What did you use your mass spec for?
Our first applications were in mapping phosphorylation sites. When we first started, we did top-down studies on ribosomal protein complexes. In 1996, we determined the masses of ribosomal proteins with the intent of identifying covalent modifications. We compared their observed masses to the expected mass, and examined the types of modifications. Many were what you would expect — certain N- and C-terminal modifications. But some mass changes didn’t correspond to any known modifications, although we weren’t able to identify these chemistries with the methods then available.
The third early application was the development of hydrogen exchange mass spectrometry techniques. We were the first to apply this to looking at the effects of kinase activation, in particular of a MAP kinase kinase, MKK1.
Around 1996 or so, a number of studies came out describing in-gel digestion to identify proteins from 2D gels by mass specrometry. We applied that technology to study responses to signaling processes — in 2000, we identified targets of MAP kinase signaling in mammalian cells. In that study, we comprehensively identified changes in spots on 2D gels and obtained unambiguous sequences. We found 41 targets of cell stimulation and 25 MAP kinase targets. This was one of the first illustrations of how you can use a protein screening approach to find new signaling targets.
What have you done lately using proteomics?
Overall, our contribution to proteomics is applying methods and developing protocols that are robust enough to apply to real-life problems, and then validating them, using molecular and pharmacological techniques.
We have continued to use 2D gels. For example, we screened 19 cancer cell lines for markers of cancer progression. Those cell lines were given to us by Meenhard Herlyn at the Wistar Institute who has made hundreds of cell lines from biopsies from melanoma patients. When we first started, we were really struck by how difficult this was going be, even with the best technology. That prompted us to switch to proteomics analysis by shotgun methods, following leaders like John Yates. In early 2002, we developed methods to do shotgun proteomics on human samples, using the same samples we analyzed by 2D gels. We found it pretty straightforward to set up methods to collect the data — that took a few months — but we quickly ran into problems during the data analysis. There are several programs for matching MS/MS data against peptide sequences, but we found that they give you large mis-assignment rates. We also found that common methods of using threshold cutoffs lead to a large loss of usable data — our estimate is that you lose about half of the usable data.
We then developed a strategy where we combine Sequest and Mascot. We examined datasets by the two search programs. When the results were above a certain threshold, we accepted the assignment straightaway. For assignments for scores below the threshold, we looked for consensus between the two programs. When the two programs agreed, about half the assignments were correct, and we looked for reasons why the others were incorrect. We then developed some heuristic rules that would allow you to filter out the incorrect assignments. That’s how we developed MS-Plus.
We have also developed an approach for assigning the peptide sequences to the correct protein entries. If you have a peptide sequence that’s in two different proteins, e.g. splice variants or isoforms, sometimes the programs will match one spectrum to one protein entry and the other spectrum to another protein entry, without telling you that it’s actually based on the same peptide sequence. You think you have got two proteins in your list when, in fact, you really only had one peptide sequence. There is a program from the Yates lab that deals with this, called DTASelect, but we were not satisfied with the display of the ambiguities in assigments. We wrote a separate program, Isoform Resolver, that uses a different strategy. Instead of searching the protein database, we made a list of all unique peptide sequences. We matched each peptide sequence to the proteins that contain that sequence. So when we see the peptide, we know how many possible protein entries can be represented. We have also included rules where we don’t distinguish between residues with similar masses, under the assumption that you can’t really tell the difference. Both of these steps minimize the inflation of the protein count, which we estimate is 25 percent if you don’t apply this.
We are currently working on assembling MS-Plus and Isoform Resolver for distribution, which we hope will be completed in two to three months.
Are you developing any other methods?
What we are also working on now is to develop methods to analyze differences in proteins quantitatively. There are a number of approaches that have been published. Particularly popular these days are differential isotopic labeling. However, at least in the published data, the proteins that one observes by isotopic labeling that one can quantify are the same proteins you can see on 2D gels.
What we have been trying to do for the last year or so is to develop methods to look at direct intensity measurements as a readout of relative changes in protein abundance. Our preliminary results suggest that it’s working fairly well. The question whether it works as well as isotopic labeling we don’t know, but we think it may. We are applying a second approach, which is to use statistical sampling of the number of spectra that you get for any given peptide as a readout of protein abundance. If a protein is very highly abundant, the chances are that you get higher coverage, and you sequence peptides in that protein more often. And that’s working out, too, with human samples. The combination of both methods may turn out to be most effective.
What real-world problems are you applying your methods to?
We apply our methods to responses to signal transduction. We are very interested in the question of how responses change in situations where there is combinatorial activation of multiple signaling pathways, and then apply those kinds of rules to situations in disease progression, particularly cancer progression, where depending on the stage of the disease, there are different combinations of signaling pathways that are constitutively active.
We are still working with melanoma, where we have these different cell lines representing different cancer stages. We are taking melanoma cell lines from early biopsies, and we are turning on different pathways, using molecular pharmacological approaches, to look at protein responses at a molecular level. Then we are taking the set of responses that we can monitor, and we are looking for those particular responses in the other cell lines, with the intent of identifying markers of cancer progression that also report targets of signaling pathways.
What are the major technical obstacles remaining in proteomics?
The technical obstacles are in obtaining higher sensitivity to be able to sample a greater percentage of the proteome. In our most recent paper in Analytical Chemistry, we were able to sample more than 5,000 human proteins from soluble extracts, not counting the membrane proteins. It think that’s a record. However, my estimate of the soluble proteome in a human cell — that is the number of distinct open reading frames in a given cell type — is somewhere around 10,000 soluble proteins, and maybe another 3,000 or so membrane proteins. So our guess is that we are probably sampling half of the soluble proteome. But we want to see almost everything, we want to see telomerase, which is present in, say, 100 copies per cell.