At A Glance
Name: Jonathan Jarvik
Position: Professor of biological sciences, Carnegie Mellon University, Pittsburgh, Pa., since 1978.
Chief Scientific Officer, Sequel Genetics, Pittsburgh, Pa., since its founding in 1999.
Background: Post-doc in cell biology, Yale University, 1975-8.
PhD in microbiology, Massachusetts Institute of Technology, 1975.
BA in history, Columbia University, 1967.
How did you get involved with proteomics?
Some years ago, in the ‘90s, I wanted to isolate an organelle from a cell and look at its composition, and there was no good way to isolate it. It was in having to deal with that challenge that I came up with a way to do it that involved putting peptide tags into the proteins that composed the organelle. This method targeted the genome by actually putting a guest exon into an intron in the gene, and the consequence was a guest peptide in the protein.
When we started doing this in a serious way in mammalian cells, we began using GFP as the guest peptide, so now you have the chance to see individual protein species in a cell with fluorescence. Then any gene in principle we could hit, because we began to target the genome in a semi-random way with retroviral vectors. So we’re in a certain sense in the realm of proteomics because we’ve got a lot of cell lines, each one tagging a different protein. If we put them side by side, we see the two different proteins and how they behave, and if we put 100 of them side [by side], then we could [theoretically] see the behavior of all 100, one cell at a time.
Two colleagues at CMU were involved in this. One coined a useful term for what we’re doing: location proteomics — where you’re getting information about the individual proteins in terms of their subcellular localization. But that’s not the only thing you can learn with this approach, which is called CD tagging.
Originally we were working in green algae, and I was interested in the assembly of the flagellum. The base of the flagellum in each cell is the basal body, which also acts as the centriole during cell division, and is the microtubule organizing center for the cytoskeleton. But that thing is left behind when you remove the flagellum. So [basal bodies] were hard to purify. We were more interested in seeing the proteins than in necessarily purifying them. [But] what we realized early on was that the same peptide tag that we used to visualize them could also be an affinity tag if you want to purify the polypeptide.
What are you working on now?
Our main interest is trying to get high coverage and a lot of tagged lines. We’re also working on other ways to introduce these genes rather than go straight into the genome with a vector, [such as] using transposons to carry the CD cassette into genomic DNA, then transfecting that DNA into cells. We also have used a retroviral vector to deliver the CD set straight to the genome. If you can have a tagged gene that you can introduce into any cell, that’s got some virtue as opposed to when you tag the genome, you’re kind of stuck with the cell that you tagged. But if that cell is an embryonic stem cell, then you have the chance to get the gene tag into every tissue and cell type, and we’re very interested in going down that route too.
For others working in proteomics, what is the importance of your work?
It’s location, and the fact that we use live cells. We look at the behavior of the protein in the cell with high resolution microscopy over time. Most of proteomics — at least the part that uses 2D gels and mass spec, and now these are really the workhorse methods — is the opposite of a live cell, it’s extracted material. So the only way you can get the kinetics of what’s going on in the cell is to get a bunch of time points and compare what’s going on in the cell now and 15 minutes from now.
That’s the same advantage as if you tag a cDNA. But advantage number two is that the regulation of the gene is maintained. So if levels change because the transcript level changes or the translation changes of a given transcript — all that regulation we can see. Whereas if it’s a fusion between a cDNA and a GFP encoding DNA, you’re driven by an unnatural promoter in that situation, and you get overexpression and constitutive expression — you certainly don’t get natural expression.
But isn’t there an unnatural element in randomly introducing a tag — doesn’t it disrupt regulation?
Sometimes it does. But we’ve found that’s actually a minority of the time. It doesn’t mean, though, that in any one case you can predict whether it will. But the other way we look at it is that the average gene has almost 10 introns. So you have 10 different places for the tag to be in the gene and the protein. And not all 10 are usually going to inactivate it. We’ve found that most of the time the function is there, and most of the time the localization is correct if we know in advance what the locali-zation should be.
What are the obstacles to getting full coverage of the genome this way?
First of all there’s the randomness of the insertion. You can get many thousands of genes but that doesn’t mean you can get them all without going to very large numbers. There is some preference for how the retroviruses insert, but the way to get around that is to use a family of retroviruses and each one seems to have a separate preference. A major issue for us is if you want to use GFP to localize, not every protein is abundant enough to see it, because we’re looking at the natural level. One way we address that is, we don’t just use GFP for the tags — we’ve started using luciferase-GFP fusions. With that, the sensitivity is much greater and you can be down to one or a few copies per cell. But that’s not a microscope measurement. So you’re not getting localization, but we can combine the two — once you know the protein is there, when you go to the microscope you can look harder.
Are you going to be commercializing this technology?
There is a company that I’m a founder of that has the commercial rights to do these CD tagging technologies: Sequel Genetics. We certainly see commercial promise in it. One realm for sure is the cell-based assay realm, which could come directly from the company or the company could collaborate with some other entity which is interested in running internal cell-based assays. And then there’s the model where the company provides tagged genes to the community.
The original company [started in 1999] and later split into two companies — one called Spectragenetics, and Sequel Genetics. Spectragenetics does genotyping using mass spec, and Sequel is on the proteomics side. That split happened in February .
It’s also very cool what [Spectra] does. We make peptides from the nucleic acid of interest, and it’s the peptide that goes into the mass spec. What you have when you have a mutation is a heterozygote — two copies that are different from each other. If you’re doing cancer genotyping, which is where Spectra is focusing its effort, in a lot of the tumor samples a mutant sequence may only represent 10 or 20 percent of the total, and that’s where the mass spec really has an advantage. If you have even 10 percent of the DNA molecules that have a different sequence, there will be a peptide in a different place in the mass spectrum. You don’t have a problem in principle in detecting that because you’ve got lots of dynamic range in the instrument. Because we do multiple reading frames, there’s usually at least one reading frame in which the mass of the mutant peptide is distinguishable sufficiently from the mass of the non-mutants. So it works — we’ve got several publications out on that, and more coming out.
In terms of Sequel Genetics, when will you start to commercialize your products?
We’ll do it as quickly as we can, and we’ll do it right. It’s not a timetable of months, but it could be as short as a year.
How do you see the future of proteomics?
Proteomics doesn’t deal with phenomena. It deals with descriptions. To the extent that one can mine the data sets, there’s certainly a chance to get knowledge out of that that’s new. The endeavour is very worthwhile. But will there ever be a department of proteomics in a university? My thought is, you’re never going to see that. So it’s probably going to fall back into genetics eventually. It will morph into something else. I’ll bet you anything.
Two papers came out in Nature last week describing localization studies that UCSF researchers did on GFP-tagged yeast proteins. How does your work relate to those studies?
The Nature papers are very relevant, and very timely, with respect to what we are doing. The work described there is the equivalent of what we are doing in many essential respects, but in yeast. [For example], the real gene, rather than cDNA, is tagged, so there is natural regulation. [Also], GFP and epitope tags are used for localization and purification of the protein, and live-cell localization is done by fluorescence microscopy. A large fraction of the proteome proved to be accessible through fluorescence microscopy of live cells in which the protein is present at natural levels, and the great majority of the localization data obtained for GFP-tagged proteins showed that the tag does not dramatically alter localization and function in most cases.
The big difference comes from the vast difference in complexity between yeast and higher organisms. It is approximately meaningful to speak of the yeast proteome, but the equivalent “human proteome” is a much less useful concept. There is no doubt in my mind that a resource similar to what was reported in the Nature papers would be of huge value for human genes, and our ambition is to generate just such a resource.