Skip to main content
Premium Trial:

Request an Annual Quote

Jonathan Weissman on Tagging Proteomes and Being Loyal to Yeast


At A Glance

Name: Jonathan Weissman

Age: 37

Position: Associate professor of cellular and molecular pharmacology, University of California, San Francisco, since 1996.

Assistant investigator, Howard Hughes Medical Institute, since 2000.

Background: Was a lead author on two papers in the Oct. 16, 2003 issue of Nature describing the systematic identification, quantitation, and localization of 80 percent of the yeast proteome (see PM 10-17-03).

Post-doc in protein biochemistry, Yale University, 1993-96.

PhD in physics, MIT, 1993.

BA in physics, Harvard University, 1988.


Why did you decide to study the whole yeast proteome in this way?

One was a personal reason — [co-author] Erin O’Shea and I had started up [together] in the Howard Hughes Medical Institute, and it seemed like an opportunity to really start in a new area. We’d been graduate students together and known each other for a long time, so it was a great project to try together. We were both protein biochemists of sorts. We’re both getting interested in different aspects of systems bio-logy and trying to get a more coherent picture of how a cell works, and there was enormous need for more information about where proteins were and how much of them there were. The tools for actually measuring how much protein was there were not very good.

What led you to decide to try using GFP and TAP tags?

There are mass spec-based approaches, but for the GFP there is no real alternative method — there we felt it was absolutely critical to be working with proteins expressed at the endogenous level and from full-length proteins. There had been other studies from overexpression studies or from random fusions and they were certainly valuable, but with yeast it was really possible to make the full-length proteins and do the experiment exactly as close as one would like to ideally [do].

There are 6,000 genes in the yeast proteome, but you found that 700 genes don’t code for proteins — how did you figure that out?

It’s been known that there were a lot of spurious ORFs — a number of a different methods had suggested that’s the case from a computational point of view. But this is the first large-scale experimental protein level data [that’s been collected]. The importance of this is that it now gives an experimental data set which can be used to validate computational approaches that distinguish real from non-real ORFs. This is going to be a huge problem in annotation for more complex organisms. It’s already hard with yeast, but you can imagine with humans where the genome is much larger and more complex, it’s a much bigger problem.

Our criteria [for determining spurious ORFs] were two-fold — one: ‘did you see it or did you not?’ We knew that our false positive rate was very low — when we saw a protein, it was very likely to be real. But we also knew there were a number of reasons we wouldn’t see proteins. An alternative is that the protein was there but was beyond the limits of our detection, or maybe the fusion protein was nonfunctional. But by looking at the properties of the protein — like what amino acids it uses and what codons it uses — we’re able to help distinguish between real and non-real ORFs, and putting together the two criteria of what did the gene look like in terms of amino acid composition, and ‘could we see it or could we not,’ we were able to identify most of the spurious ORFs. It was sort of a dual computational and proteomics approach.

You also managed to achieve very high sensitivity — what were you doing differently?

Most groups in the past used mass spec, and mass spec can be very sensitive, but it’s hard to distinguish low abundance proteins in the context of much more abundant proteins, so it’s more a matter of specificity than sensitivity per se. By tagging the proteins with these epitope tags, you limit the background because now there’s only one protein recognized by these antibodies — we used the fact that these antibodies were extremely specific and had very high sensitivity.

Are you going to move onto mammalian cells now?

One approach that I think will be important will be to use tagging in more complex organisms like humans. It will be straightforward to make 30,000 GFP proteins or 30,000 TAP-tagged human proteins. This is being done by a number of people (see PM 10-24-03). It’s much harder to have them expressed at their endogenous level, but I think you can still get a fair amount of information expressing them with artificial promoters. So one route is to use these tagged fusion libraries and do localization and protein interaction in mammalian cells. But the other is to use these data to really understand at a much more sophisticated level how yeast work. Yeast have become the premier organism for doing systems-level biology. So we’d like to watch dynamic changes in protein abundances, lifetimes of all the proteins, and different activities of all the proteins — and try to get general principles that we can use for more complex cells.

So what will be next step for your lab?

The immediate thing is measuring changes in protein abundance over time. One example would be to arrest at different stages in the cell cycle and look at which proteins are changing in abundance.

We’re also trying to develop much faster approaches — either by optimizing what we’re doing or by using the fluorescence signal to measure by fluorescence cell sorting or cell abundance measurements. It’s possible to do high-throughput FACS analysis so you can measure the abundances of 1,000 proteins a day.

Are you looking to commercialize this technology?

That’s not our main interest at this point.

What obstacles face proteomics research right now?

A large part of it, with mass spec, is getting completeness. It’s relatively easy to look at the most abundant corridors of the proteins, but many of the critical proteins are poorly expressed. So increasing the sensiti-vity and speed [is important]. Another major area is post-translational modifications. Trying to get standardized methods for looking at whole proteomes is also a big job. The thing that makes proteomics hard and interesting at the same time is that every protein has its own personality and its own physical properties that let proteins do so many things. But it also makes it very hard to make a generic one-size-fits-all approach. Conceptually, that’s what the tags were about — we thought by giving all the proteins the same tags, you could then treat the tags as a generic one-size-fits-all strategy. So by having the tagged library, you can now reduce it to a simpler problem.

You cited Mike Snyder’s work in your paper — how are protein chips complementary to your methods?

They are highly complementary — protein chips will be a very fast and a very good way at getting at certain types of protein-protein interactions. But there are a lot of other things where it’s really critical to have multi-protein complexes, and the right abundances of the proteins, or where proteins just won’t be well-behaved on a chip. Their approach was to typically overexpress it, purify it, and put it on a chip. But that wouldn’t work for example for ribosomes, where you need 20 proteins and rRNA. Our approach would be to express [the protein] at endogenous levels and then purify it, using the fact that they’re tagged. Certainly anything you can do on a protein chip, you can probably do faster that way, but I think there will be whole classes of problems that will be hard to address [with chips]. There are some things, like GFP localization, that you’re not going to get from protein chips. It’s not a good method for looking at changes in protein abundances either.

So what’s the take-home message?

The big take-home message is that the proteome of an entire organism is now accessible in a way that just wasn’t possible before. That’s in my mind the most exciting thing — now it’s possible to be thinking about doing experiments that were not really addressable by any other means.

Let’s say you have a particular enzymatic assay, for example, where you want to find what the kinase was that was phosphorylating your particular protein. It’s now possible with these libraries to screen through the whole yeast proteome fairly rapidly. Or if you want to know [whether] your protein abundance changes through the cell cycle, or what transcription factors are changing though the cell cycle, you could order the 200 or so transcription factors and follow their abundances. The transcription factors were just too low in abundance to measure by mass spec. You can also see if your protein changes into some condition you’re interested in, you can pull down the protein and ask what other proteins are interacting with it, or you can use your GFP library and say ‘does [the protein] move in or out of the nucleus,’ for example. These were all things you could have done before if you made the library, and if you made the tag, but now it lowers the barrier to doing that. We’ve made all these things available at a nominal cost through a distributor.

So you’re lowering the barrier to doing a lot of things that would have been more difficult before?

Yes, plus [providing] the methods and reagents. Plus, you can look up the data — so now if you clone a protein and want to know where it is, you can go to a website and see the pictures of where it’s localized. You cannot only see where we thought it was localized, but you can look at the pictures yourself and say, ‘does it look like there’s something else in there?’

The Scan

Tens of Millions Saved

The Associated Press writes that vaccines against COVID-19 saved an estimated 20 million lives in their first year.

Supersized Bacterium

NPR reports that researchers have found and characterized a bacterium that is visible to the naked eye.

Also Subvariants

Moderna says its bivalent SARS-CoV-2 vaccine leads to a strong immune response against Omicron subvariants, the Wall Street Journal reports.

Science Papers Present Gene-Edited Mouse Models of Liver Cancer, Hürthle Cell Carcinoma Analysis

In Science this week: a collection of mouse models of primary liver cancer, and more.