Head of plant proteomics
At A Glance
Name: Sacha Baginsky
Position: Head of plant proteomics, Institute of Plant Sciences, Swiss Federal Institute of Technology in Zurich, since 2001.
Background: Postdoc, University of California at Berkeley, 1999-2000.
Scientific assistant in department of plant physiology and molecular biology, Ruhr-University Bochum, Germany, 1994-1999.
PhD, Ruhr-University Bochum, 1998.
Sacha Baginsky is scheduled to give a talk on plant proteomics on Dec. 6 at the Swiss Proteomics Society congress. ProteoMonitor decided to talk with Baginsky to find out about his work and his background.
What is your background in terms of proteomics?
My original background is in plant biology, so I've been working at the University of Bochum and U. C. Berkeley, and there I got interested in plastid development and differentiation. So in general, plant cell organelle development and differentiation. And in 2000, I got in contact in Don Hunt, and he suggested that looking at the proteome of plant cell organelles would be a great opportunity to find potential players in differentiation processes. And we followed up on that. So that was the end of 2000, early 2001, that I started doing proteomics with plant cell organelles.
I then went over to Zurich around 2000 and took over a proteomics group. I established basically a mass spec proteomics group here for plant organelle proteome analysis. That was how I got into that. So my driving force was originally exclusively biological, but we also entered now into a couple of collaborations to develop tools for protein de novo sequencing, quality scoring of peptides, and more technical aspects of the whole issues. These were questions that arose originally from limitations of proteome analysis.
It's clear that some proteins you can not detect with currently employed proteomic techniques because they are not in the database, or they're post-translationally modified, or alternatively spliced. So that got us interested in more technical aspects of the whole issue. I've been quite busy with that as well. I have these two basic things running in parallel: biological questions connected to differentiation, and the technical aspects.
What, more specifically, are the technical issues that you are working on?
One example is de novo sequencing with the hidden Markov model that we are trying to establish. That means exclusively extracting an amino acid sequence from the data in an MS/MS spectrum, and not using a database. So that is one aspect that we are working on. The other is using quality scoring for spectra. So spectra has a certain quality depending on if it's derived from good peptide fragmentation, or some noise fragmentation, and you can distinguish that. You can distinguish good peptide-derived spectra from noise-derived spectra. And when you apply quality scoring, you apply a couple of heuristics to these spectra, and then automatically assess the quality of these spectra with a scoring tool.
What we're currently suggesting to do is to do a normal database search with high-throughput proteomics data. Then you come up with a list of identified proteins. And when you analyze these data again with a quality-scoring tool, then you can find a group of MS/MS spectra that were not assigned in the database, but have a high quality so are likely to be derived from peptides. And with these you would do additional searches, like allow for post-translational modifications, do genome searches or ESP searches. This is all not possible with a complete set of high-throughput proteomics data, because that's just too many normally. When you just increase the search space for your data, you generate lots of false positives, so that's restricting the general applicability of such searches to high-throughput data, so that's why you need such tools that can automatically score these spectra.
The genomes of many biologically relevant plants are not sequenced. That is an issue. I mean, rice is sequenced, Aribidopsis is sequenced, but that's basically it. There might be tobacco, but that's not available, so basically you're dealing with Arabidopsis and rice. Relevant crops like cassava or others are not sequenced. If you want to analyze them, you need some de novo sequencing tools. That's why we see a necessity for such tools. Also, post translational modifications are an issue. Because you could predict some, and allow the database searches to consider them, but this makes your search space too large, and might generate false positives. So you need some handle on post-translational modifications.
In terms of studying biological questions, are there certain organelles that you study?
Yes, it's plastids. So we are focused on plastids, and the plastids have the features that they can develop and differentiate into different forms. They are highly relevant for biotechnology. When you look outside, the green leaves are all chloroplasts — this is one particular form of these plastids. In potatoes, starch is stored in amyloplasts, so this is another form of plastids. And in bell pepper or tomato, the red color is chromoplasts — another form of plastids. And we were interested in just profiling the proteomes of these different types of plastids, and also seeing what happens during the transition — we can experimentally induce the transition of one plastid type to another. What we are using is the light-induced greening of chloroplasts. So you can experimentally generate another form that is an etioplast. For example, when you have a plant growing under a mattress or so, you see that it's all yellow. That yellow color is due to etioplasts. So when you shine light on them, you see the chloroplast development — they start to green. And this transition is something we analyze with proteomics. Proteomics is great for that because most of the things that happen there are very rapid responses that involve post- translational modification. That's something you wouldn't see with transcriptomics, or other types of transcript analysis. That's why we think proteomics is a great tool for doing that.
Have you found significant proteins from those studies?
Sure. Well it's known that chlorophyll-binding proteins are upregulated. Chlorophyll- binding proteins are those proteins that bind to the green color, or vice versa. So the green color that is the chlorophyll molecule binds to these proteins. This is necessary to prevent [destruction of the chloroplast]. These green colors are very, very photoreactive. That means when you shine light on them, they will be oxidized. And when they are not controlled, they can destroy the whole chloroplast. But the chloroplast avoids that by binding them to proteins, and protecting the cells from these photo-oxidative effects of chlorophyll.
So it is known that these are the first proteins that are upregulated by illumination — these proteins that are necessary for protection. What we found out in addition to that is that on the regulatory level, RNA-binding proteins are altered. So it looks like chloroplast development is paralleled by an increase of mRNA stability in the plastid. That is something that we have inferred from our data, since the first thing that we see at the regulatory level is that RNA-binding proteins are being phosphorylated. And we know from other in vitro experiments that phosphorylation of RNA-binding proteins increases their affinity to RNA, and this can protect the RNA from degradation by nucleases.
Do you think there could be some application for this in plant engineering?
Well, maybe. We need to know more about the plastid machinery that determines RNA stability, because when you think about engineering biotechnology, you think about plastids automatically. Plastids have some advantages for biotechnology. For example, they are in high number in the cell, so if you want to produce something with high efficiency, you would use the plastid. Plastids are not in the pollen, so they would not spread out. Certainly we need to know much better what is in the plastid in terms of nucleases and regulatory factors — that would help you to assess in advance how efficient your trans-gene approach could be.
Another possibility is to look at protease substrates. If you have a plant or fruit that is poor in proteins, this is mostly happening when you harvest the plant, because proteins are being degraded by proteases. We are currently screening protein-poor plants for those targets of proteins that are the first to be degraded by proteases. That will give us an idea of which proteases are active there.
These are only some applications that we can think of. In general, it's good to know more about what is happening in the plastid, and in general proteomics is a good screening method for such type of analysis.
What's the reason that you chose plastids as opposed to another organelle?
Well, plastids are from my research history. I first studied plastids. We have also studied to include the vacuolar membrane now for our analysis. But the plastid is a very fascinating organelle because of photosynthesis. I mean, that's one of the most important biochemical reactions on this planet. And plastids are also able to synthesize many other important compounds that are essential for [the] human diet, for example branch-chain amino acids. In that context, it was of very high interest to me to keep on studying plastids.
What kinds of plants do you use for your studies?
Currently we are using Arabidopsis, which is generally a model plant for biologists for several reasons: it has a small genome, which is a well characterized genome, the genome is well annotated, and it has a short generation cycle so you can easily grow many, many Arabidopsis plants in a short time. This is important for functional genomics. Most of the knowledge that is available on plants is from Arabidopsis. Especially when you're looking at things like transcriptomics — look at all transcripts from the nucleus — that's another aspect of my research: How do transcript levels translate into proteins? This is something that is an unsolved matter, I would say. It's important for systems biology, because we are not able to predict the amount of a protein from the amount of a transcript. This is really important. For that reason, it's good to use a well-characterized system, because we can plunk our proteomic research into other research that is going on at the transcript level, but also at the metabolite level.
In addition to Arabidopsis, we are also using rice. That's very important for us as a crop plant, and it's developing into an excellent model also, similar to Arabidopsis. And we have started using also tobacco cell culture and bell pepper chloroplasts.
For your proteomic work, do you generally use 2D gels, followed by mass spec analysis?
Currently, for the analysis of the plastid type transition, we are using 2D gels. Post-translational modifications are very important, and you can really nicely see on a 2D PAGE when a spot has shifted towards a different pH or towards a different isoelectric point or molecular mass. So you can easily see post-translational modifications in the gel, while most of the mass spec-based methods that are applicable to relative protein quantitation would probably not reveal post-translational modifications, or at least would miss a lot. For ICAT for example, where you concentrate on cysteine-containing peptides, you must be very lucky if a post-translationally modified peptide is also a cysteine-containing peptide. And that is the reason we use 2D PAGE. The phosphorylation of the RNA-binding protein was the result of a series of 2D gel experiments.
What other projects do you have going on?
I work together with someone on vacuolar membranes. The vacuole is considered the 'trash bag' of the plant cell, so many, many compounds are stored there that should be kept away from important other functions of the cell. We're trying to find out what transporters are in the membrane — what we can expect to be transported across the vacuolar membrane — and also there's the question of, 'Is there more to the vacuole than just being the trash bag? Is it also important in cellular trafficking?' But this is a collaborative project, so I don't have the lead in there.
And then I have another project that deals with trying to improve the annotation of proteins of unknown function. So when we find a protein by mass spec, we can clearly state that this is no longer hypothetical, but it is expressed. We have teamed up with Kimmen Sjolander from the University of California, Berkeley, who is helping us to try to infer a function from the sequence of these particular proteins. And this could enable us to assign new functions to any organelle that we're working on.
And one more aspect is the development of additional tools for mass spec analysis — this goes to de novo sequencing and quality scoring of peptide.
For the de novo sequencing, are you developing new techniques?
Yes. We started out with developing a dynamic programming algorithm, but found out that several months before we found the algorithm, someone else published it. That was not so bad, but dynamic programming is one approach to go for de novo sequencing.
And the hidden Markov model approach is something that's completely new. I do that in collaboration with other people from ETH Zurich. I couldn't do that alone. The hidden Markov model gives you a probability if a certain sequence is correct. That is being published in Analytical Chemistry this week.
What projects do you have planned for the future?
I would like to continue with this type of work, and I would go now more into the validation aspects. For example, we've found several proteins that are not predicted to localize to an organelle that are surprisingly there. The question is, 'Is this a contamination, or is it a true protein that goes into the organelle via a new pathway?' And I have started already analyzing thoroughly the targeting of proteins into organelles.
We can make a few predictions from our quantitative proteomics data. For example, we have for the rice plastids 2D PAGE map — we have absolute quantities of proteins, so we can make predictions for that. We can sort them into pathways, and we can predict if one pathway is more predominant over the other, and how the plastid is generating energy.
Also here, we want to do validation by metabolite flux analysis. Is it actually true what we're saying, that one pathway is predominant over the others? Then we should see at the metabolite level that the metabolites of this pathway are more abundant than the metabolites of the other pathways, or in general that the flux through this pathway is stimulated, as opposed to the others.
The last thing for the future is to look systematically at post-translational modifications. I've started looking at phosphoproteins. There are several methods for enriching phosphopeptides — this is something that we're doing for chloroplasts and etioplasts.