At A Glance
Name: Andrej Shevchenko
Position: Group Leader, Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany, since 2001
Prior Experience: Staff scientist, EMBL, 1998-2001
Postdoc, peptide & protein group (Matthias Mann), EMBL, Heidelberg, 1994-1997
(Senior) research scientist, Institute for Analytical Instrumentation of the Russian Academy of Sciences, St. Petersburg, Russia, 1991-94
PhD in chemistry, Leningrad Institute of Technology, Russia, 1991
How did you get into mass spectrometry and proteomics?
I started doing mass spectrometry in St. Petersburg, formerly Leningrad, at the Institute for Analytical Instrumentation, where electrospray was actually described at a similar time [that John] Fenn [described it in the US]. I started as a protein chemist but then became interested in mass spec, electrospray ionization, and applications of the technology to solve biological problems.
The focus of my PhD was the degradation of peptides in body fluids; I worked on blood and gastric juice. I used synthetic peptides as markers and investigated how degradation of the peptides proceeded in this multi-enzyme environment. Mass spectrometry became the key technology because the profiles of the digestion products were quite complex, and they were quite difficult to decipher by other methods.
Then I was looking for a postdoc, and I got invited by Matthias Mann to EMBL, where I spent seven years in total, first in Matthias’ group, and once he left for Odense, as a staff scientist.
What did you do during your time at EMBL?
When I started, even the term “proteomics” had not been coined yet. Matthias had a bunch of interesting technologies involving nanoelectrospray ionization, but [his lab was] not quite successful at applying them to biological problems because the interface procedure — how you go from the gel to the tandem mass spec experiment — hadn’t been developed at the time. That’s what I did first, rationalize and simplify the handling protocols, so that you could get femtomole amounts of proteins on a gel, digest them with certain proteases, and then go for nanospray analysis.
We came up with a very simple solution, which was published in 1996 in Analytical Chemistry. This paper is amazingly popular; it has had close to 900 citations over the last six years. Instead of complicating the protocol, we just simplified things; that’s probably why people love it.
Also, there was a common belief at the time that sequencing proteins from silver-stained gels would not be possible, because the silver stain introduces modifications into the protein. However, we found out that silver-staining is an absolutely safe method.
Then we looked to see if we could scale up the technology and go for a little higher throughput. In the same year, we published another paper, and we came up with a two-layer solution: a first quick screening by MALDI, and then, if necessary, an in-depth analysis by electrospray. We identified about 150 proteins from a two-dimensional gel. That paper is also a fairly popular paper; it has about 350 citations or so.
Then we saw that what was missing was de novo sequencing. So we came up with a very simple solution on how you do de novo sequencing by quadrupole time-of-flight and peptide isotopic labeling. It worked so nicely that we had tons of papers later on.
Matthias, then, looked for further opportunities and moved to Odense, and I started doing something else. By that time, we had pretty good success at identifying members of protein complexes isolated by immunoaffinity chromatography. But it was obvious that those complexes do not act alone: There had to be some physical connections between them; they do not talk to each other by cellular phone. So we started a collaboration with Ray Deshaies from Caltech on isolating yeast proteins and identified a large number of totally unexpected links between protein complexes.
How is your approach different from the TAP tag method used, for example, by Cellzome?
The TAP tag approach is the tool. You can use TAP tagging on a genomic scale, or you can use it sequentially. It turns out that the differences are huge. What the genomic approaches do very often is, they artificially merge protein complexes. We found that in the yeast genome, about a quarter of proteins are involved in more than one, and in many cases more than two protein complexes, which are functionally distinct. What we are trying to understand now is what those associations mean — we used to call them “proteomics hyperlinks.” We are doing this on a much smaller scale, but we are doing it very accurately. On average, we get five to six reads from each and every complex. If the protein complex contains five or six subunits, all of these will be tagged and ID’d. We are getting individual protein complexes, and we are getting subunits which are shared between the complexes.
The value of Cellzome data is that it’s big; it covers substantial part of the proteome. It’s a good starting point; in a sense it’s like ESTs. We are producing very redundant data, but that’s how you notice small changes, and that’s the way you notice that we are in fact talking about two different protein complexes, rather than one.
Where did you move on from EMBL?
I decided to take an offer from the new Max Planck Institute in Dresden. At the moment, we have two mass spectrometers here, a Qstar from MDS Sciex, and a Reflex from Bruker. I am looking forward to buying a third one, [probably an] ion trap. We have eight people in the lab now, and I don’t think we will ever be growing much, because otherwise it’s no fun for me.
What do you work on now?
Despite the huge success of genomic sequencing, if you just count the number of organisms which have their genome sequenced, it’s less than ten, not counting bacteria. Basically, there is no way that genomic sequencing will ever catch up [with the number of species], and I don’t think it’s necessary, because a lot of proteins can be identified using the sequences already available in the databases. Unfortunately, the technology available is not that good for that task because it relies on exact matching of peptides.
So what we have come up with is the so-called MS BLAST approach. We developed it into a web-accessible tool, so that anybody who is working on an organism with an unsequenced genome can use it. [Even if the mass spec data are] inaccurate, the software takes care of this and picks up the right matches, and we do identify a lot of distantly related proteins. We now have some very good applications in plant biology and insect biology running, and also in mammals.
Have you commercialized MS BLAST?
It’s on the web, but I am pushing companies to commercialize an interface to MS BLAST. Applied Biosystems has already done so: it’s in their software package; Bruker now has it, and I hope very much that [Thermo] Finnigan will have it very soon.
What else have you done?
Another focus of my work here in Dresden is working with lipids, to simultaneously detect lipids in very complex mixtures. [Part of the reason I became interested in lipids is that] proteomics is becoming too crowded. We have reached the stage where it’s not creativity which is important, but rather the number of resources you put in: Two mass spectrometers are better than one, and four are certainly better than two.
Also, we are becoming very interested in an evolutionary approach [to protein complexes.] We want to map them very accurately, for example, to compare protein complexes in budding yeast, fission yeast, C. elegans, and Xenopus — really different species — then isolate orthologous proteins, and then apply bioinformatics to try to understand what those associations mean. We see that the protein composition is conserved, but at the same time, that links to other protein complexes are not conserved. That brings some new understanding on how the proteome is organized.
Where do you see proteomics and proteomics technology going?
I think this multi-organism perspective is going to be a big thing. I wouldn’t be surprised to see a huge breakthrough of proteomics in plants, where you can get any plant protein by the sequence homology searching methods. Plant genomes are huge, but using those methods which we and others developed, you are not limited to using Arabidopsis for [your research.] The same probably applies to insects, because the diversity of insects is enormous. I wouldn’t be surprised if in some years, there will be a big impact [of proteomics] in plant and animal biology. The scope will really expand.
With regard to instrumentation, I would predict that the instruments will differentiate. At the moment, the tendency is to go to instruments of higher complexity: We have very nice combinations like quadrupole time-of-flight, and ion trap-FTMS is coming. But the instruments are getting bigger, the cost is getting higher, and in fact, the instruments are becoming so complex that they are not flexible any longer. People have to decide whether they want to have a high-throughput kind of protein cruncher and go for a very challenging task of enormous complexity. That’s slowly coming, and that’s one branch. But the second branch will be mass spectrometers for small people. Once I was asked what we will start doing once this proteomics hype is over. I answered, ‘we will get back to science.’ There will be instruments for flexible applications, for trying things out. My prediction is that those types of instruments will not disappear from the market.