At A Glance
Name: Eric Phizicky
Position: Professor of Biochemistry and Biophysics, University of Rochester School of Medicine, since 1987 (full professor since 2002)
Prior Experience: Postdoc, California Institute of Technology, 1983-86
PhD, Cornell University, 1983
BSc, Biochemistry, McGill University, 1976
How did you get involved in studying protein function on a genomic scale?
I have a BSc in biochemistry from McGill University in Montreal, and I went to graduate school at Cornell where I got a PhD in biochemistry in 1983, working with Jeff Roberts. [During my PhD], I was working on SOS induction, in which phage lysogens are induced in the presence of DNA damaging agents to replicate and escape from E. coli. Upon DNA damage, RecA protein in E. coli, which normally works in recombination, mediates cleavage of the repressor. When that happens, the phage can turn on expression of its genes and replicate and escape the host. The year before I got there, restriction enzymes appeared at Cornell [for the first time]. It was right at the beginning of molecular biology.
After that, I went to John Abelson’s lab at Caltech as a postdoc, where I switched to yeast and started working on tRNA splicing. I was purifying proteins involved in tRNA splicing in order to clone their genes so that I could study their functions.
Then I moved to Rochester in the beginning of 1987 as an assistant professor, and I have been here since. I continued studying tRNA splicing, which meant again identifying the [biochemical] activity, purifying the protein, and cloning the gene. I spent almost seven years doing that.
At Caltech, we had to grow up several very large fermenters to purify enough protein to get enough so that we could sequence it. It was a 300l fermenter, so we started with 1 or 2 kg of cells. It takes a lot of time, and it is hard to convince graduate students to go to the cold room to purify proteins for a long period of time.
Around that time the genome projects were going on, and my wife Beth Grayhack and I were talking to Stan Fields [who invented the yeast two-hybrid system]. He was asking us, ‘What would you do with the genes when the genome sequence came out?’ And we said, ‘We would make all the proteins.’ We kept talking to him over the years, and in 1997 we decided that actually it would be a good idea [to] make all the proteins. We didn’t have money to do it, so we wrote an e-mail to a bunch of yeast labs asking them if they would be interested in obtaining such a collection if we could make it. They were all interested, and we used that to talk to Research Genetics and convince them to give us the oligos for very little money. Then we built [a yeast protein expression library] together with Stan [Fields]. It was not the most perfect library, but it turned out to be amazingly useful in order to find a gene whose product catalyzes a certain activity.
What does the library look like, and how complete is it?
It’s a library in which each of 6,144 yeast strains expresses a single yeast protein. It derives from the original set of primers that was used by the Fields laboratory to make their first genomic two-hybrid screen. We generally use it in pools, we grow groups of 96 strains and then purify the proteins expressed from them in a single step. What we store in the freezer is 64 tubes, each containing purified proteins from 96 strains.
What kinds of biochemical activities have you assayed the proteins for?
At the University of Rochester, in the last three to four years, we have identified genes associated with more than 38 activities, [including] just about every type of biochemical reaction: a number of different transferases, several oxido-reductases, ligases, nucleases, binding activities, and synthetases. This is the same sort of thing that I started doing as a postdoc. I would have a biochemical activity, and I would want to purify the protein to identify its gene. Instead of taking two or three or four years, it now takes a couple of days to see if one of these pools has the activity. Immediately after that, you can go and find out which strain from that pool is responsible for the activity. It’s enormously faster than the previous methods.
[Also], this approach is solution-based, therefore you can assay any type of different activity. [Protein] microarrays are great for binding, [but] for activities where you convert a substrate to a product, it’s easier to do this [in solution].
How easy or difficult is it to get to one protein from teh pool you start with?
If [we saw] the activity, we have always been able to find the gene responsible. [It depends on whether] you see the first signal. We don’t always see the first signal that allows us to say, OK, it’s in pool 13, but once we do see it, it’s easy to find the responsible protein.
What if you need more than one protein to get an activity?
This system works just fine when you need two proteins for a single activity. The reason is the same reason that co-immunoprecipitation works. When we purify a particular protein, we do it under conditions that allow co-purification of other proteins with it. If you go from two proteins to multiple proteins, there is no difference — what’s true for a two-protein complex is true for a five-protein complex or even a larger complex. You are limited only by the population of the complex, which in turn is limited by the [protein] that is present at the [lowest] concentration.
Have you improved the library since your publication in Science in 1999?
[Several of us here at Rochester], in collaboration with Mike Snyder at Yale University and Erin O’Shea and Jonathan Weissman at UCSF, are building a ‘version 3’ library, which will be improved in a number of ways. It’s [going to be] a C-terminal library, which makes it useful for the 1,000 or so estimated membrane proteins, which often require a native N-terminus to insert into the membrane. It will be comprised of the updated set of ORFs, the revised version, vintage 2002. Every gene will be sequenced, and they will all be under very tightly regulated control. [The library] is being built with Gateway technology, which allows you to move the insert containing the open reading frame from one plasmid to another very simply. We are just finishing our first pass. There is probably another round where we have to go back and get those that don’t have the right sequence, and then we have to move them into yeast and test them. I guess [it will be ready] sometime in 2004.
Are you planning to make this library available to other researchers?
Yes, absolutely. We have talked to Invitrogen, and they have no problem [with us] sending it to academic labs.
Is the determination of protein function somewhat underrated in proteomics?
Everyone is interested in monitoring protein expression, and it will be great, you will be able to type diseases or stages of a disease by different expression profiles. The functional part that we are interested in, it’s more fun. There are 2,000-3,000 genes even in yeast that no one knows very much about, and they all have some biochemical activity. You could imagine using this sort of approach to identify novel biochemical activities, maybe [those] of commercial interest, or simply to understand connections that weren’t [known before].
What are the remaining challenges?
One of the problems is this multitude of protein modifications and alternative splice products. If you go into higher organisms [than yeast], the problem is massively more complicated. It’s not just more genes, it’s the multiple of splice products, it’s the difficulty of defining the ORFs in the first place, and then all the modifications. You might have to build a library of maybe a million proteins. At the analytical end, [you need to find] very specific probes, so you could not just say that the concentration of this particular protein went up tenfold, but that it’s the phosphorylation of serine 32 that went up. For signal transduction pathways and many other processes where it’s the phosphorylation state or some other modification that changes, and only at particular residues, that becomes difficult. It’s also a problem for studying protein function, not just for analysis, [for example when] the protein you are interested in only has [a] function when it’s phosphorylated or methylated or acetylated.
Can this be done with the current technologies?
I don’t think there is any reason not to be able to do it. It’s just harder because you have to first know where the phosphorylations or modifications are taking place, and then build a separate set of reagents that discriminates that form of the protein against the unmodified form of the protein. So the problem just goes up in scale in terms of analysis.
What projects are you involved in now?
We are part of a structural genomics project headed by Wim Hol at the University of Washington to look at [proteins of] pathogenic protozoa. This is a structural biology project to purify and analyze all the proteins in organisms such as Plasmodium falciparum, Leishmania species and Trypanosoma species that cause major health problems. That’s gotten us into the business of high-throughput cloning and purification, which leaves us with many hundreds, and soon many thousands, of proteins, just looking for functions. If you want to know what the real new frontier is, to me it is, ‘Here is 1,000 Plasmodium falciparum proteins, what do they do?’ That’s in an organism where you can’t do genetics. It begs the question, ‘with all these proteins, can you think of ways to define their function?’