Project leader, bacterial structural genomics program
Biotechnology Research Institute, National Research Council of Canada
At A Glance
Name: Allan Matte
Position: Project leader, bacterial structural genomics program, Biotechnology Research Institute, National Research Council of Canada, since 2000.
Background: Research associate in protein crystallography, Mirek Cygler's laboratory, Biotechnology Research Institute, National Research Council of Canada, 1998-2000.
Postdoc, Zippora Shakked's laboratory, department of structural biology, Weizmann Institute of Science, Rehovot, Israel, 1996-1998.
PhD in protein crystallography, department of chemistry, University of Saskatchewan, 1996.
At last week's Canadian Proteomics Initiative conference in Toronto, Allan Matte gave a talk on combining X-ray crystallography and functional analyses to gain insight into Escherichia coli proteins of unknown function. ProteoMonitor caught up with Matte to ask him to describe his X-ray crystallography project in more detail.
How did you get into doing functional genomics, and what is the project that you're currently working on?
My PhD was in protein crystallography, and this is a program on bacterial structural genomics funded by Canadian Institutes of Health Research to develop high-throughput methods for parallelized protein structure determination by x-ray crystallography and NMR spectroscopy. It's a five-year program that was started in 2000.
X-ray crystallography has traditionally not been a high-throughput method. How is the current project turning it into a high throughput method?
The main issue is the ability to generate sufficient numbers of crystals of different proteins so that you can determine the structures, because it's truly the crystallization that is the rate-limiting component. The structure determination can be done very quickly, but if you don't have enough good crystals, you're not able to determine structures. A lot of what we do goes into generating good-quality protein samples, and then suitable crystals for X-ray analysis.
How is that done?
Well, our interests are geared on E. coli genomes. So what we do is we clone genes from various E. coli genomes, 96 at a time that's really high throughput. We expression-profile them in groups of 96, and they're put into three different expression vectors. We test all these vectors at the same time. And then we take the best expressing clones from these groups of 96 and produce them on a larger scale in order to purify the protein. And then, once the protein's purified, it's characterized, its behavior in solution is optimized if necessary, and then it's screened for the formation of protein crystals.
Is this only being done in E. coli, or is it being done on other organisms as well?
Well in this particular group, the emphasis is on proteins from E. coli. In other groups in the Canada, US and internationally, there's lots of different organisms that are being looked at with a similar approach.
There's two parts. One part is the methodology, which is important. The idea is that if you can develop the methodology, you can, in principle, apply it to any different number of organisms. So E. coli is sort of being used as a model. But the other component is that there is interesting and important biology related to E. coli both from a basic perspective, and from the point of view of looking for proteins that are potential therapeutic targets. So there's these two components the methodology component and the science component, which go hand in hand.
How far along is the project?
It's actually in its fourth year. It'll be entering its final year for this particular grant soon. Most of the methods are at a sort of mature stage. We still tweak different things. We try different things to see if we can improve things in the pipeline, but it's sort of operating more or less at a production scale at the moment.
The overall project has produced over 50 crystallized NMR structures up to this point, and that's for the two NMR groups and the two X-ray crystallography groups that are involved in this project.
The ultimate goal is to come up with this scheme that allows one to do more structures in a given time, and reduce the cost per structure. As I said, there's the science that comes out of it. There's been 30 or 40 publications in journals from the structures that have been determined. There's a lot of different biology that goes with it some proteins we've followed up upon by doing site-directed mutagenesis and characterization of enzymatic activity. So there's a lot of other stuff that's sort of going hand in hand with this. Eventually, these things will spin themselves off probably as individual research programs at least that's our hope.
Are the proteins that have been chosen to be crystallized especially significant in some way?
They're proteins that represent a large family of other sequences. So they're individual proteins themselves, but they represent a whole family of other sequences in the sequence databases from which you can infer the other sequences' structures if you know the structure of one. So the information is expandable to a lot of other things.
The other thing is that some of the targets that are selected are common to various pathogenic bacteria not just E. coli, but other pathogens. Other targets represent basic metabolic enzymes which are fundamental to household functions. So they have a commonality which is very broad.
Out of the proteins that have been crystallized, are there any therapeutic targets?
At the moment there are no concrete therapeutic targets by concrete therapeutic target, I would say that's something which one can demonstrate has a direct causal relationship to a particular disease state induced by an organism. Some of them might turn out to be that way, but it'll be up to others to show that relationship. Our primary interest is in the structure determination.
After you've crystallized the proteins, how do you go about further determining its function?
What we can do is compare the structure of the protein molecule to other known structures, and to see what's similar and what's different. Proteins are much more conserved at the structural level than they are at the sequence level. What that means is that proteins that have very different sequences may have very similar structures. When you compare to other structures, you may find structures that are able to bind a particular type of substrate or ligand. And if your structure resembles that, it's an indication that your structure may have that relationship to those that are already known.
One can, based on that kind of starting point, one can try different types of biochemical assays to look for that activity that you think that protein might have.
The other thing we do is we use mass spectrometry to look for the formation of products from substrates with certain proteins. This is another method which you can apply. You don't need a specific assay because you're just looking for the molecular-weight difference between the starting material and the product.
All this is framed within the context that the measurements that you're doing are in vitro. They don't necessarily tell you what's going on inside the cell. To really get any full understanding, you need to bring to bear all types of experiments, and the structural part is just one component of it. For us it might be an endpoint, but for somebody else it's a place to begin.
Are you involved with the Protein Structure Initiative, which aims to get the structure of one member of each protein family?
No, we're not. We're not directly connected to any of the NIGMS programs.
The NIGMS programs had two components as well, like us. They were to develop technologies, similar to what we wanted to do, although they're on a much bigger scale because they're funded at a much higher level. But the other component was, they wanted to obtain structures for representative protein sequences that would cover all of the protein-sequence base. So that's like having one structure for each sequence family. It's a good goal, but that's a much bigger goal than what we have.
How much funding do you have?
This program was funded at approximately CA$600,000 per year for five years. This particular program has no renewal. It's not a renewable program.
What will happen to it after this last year?
That's a good question. That's why we're trying to develop specific areas within the existing program that will have enough initial results that they can be generated into new programs.
We're hoping that specific areas of interest will become part of new programs. I think the good thing is that even though the program will finish, the technologies that have been developed will be there, and now they can be applied to these new types of programs, so it's not like everything will just disappear.
In terms of the work that you've done so far, how is that applicable to humans?
It could be applicable to humans. One example is that there's been several companies that have been started in the US that use high-throughput structure determination to develop therapeutics like Serux and XCS in California. It all comes down to what kinds of proteins you're looking at. They're looking for proteins for very specific human disease conditions. It's the same sort of principle, but it's with a different goal.
Is your technology different from the PSI technology?
Well, the steps in the whole procedure everybody has done it in slightly different ways. The common feature of all of these different things was to parallelize the steps so that instead of doing one protein or gene at a time, you would handle dozens, or hundreds at a time. The other thing that's common among all the programs, including ours, is the need or desire to introduce automation where appropriate in the pipeline. So if a liquid-handling robot can do things you want it to do, you make use of that technology.
But the PSI centers are funded at a much higher level, and they're really, truly protein-production factories. We're at a much smaller sort of operation.
Where do you think the research will go from here?
It's sort of maturing into specific areas that sort of define groups of proteins. We're going to continue in the areas of complex carbohydrate metabolism, because we've done several structures for that. We'll probably continue in the area of RNA-modifying proteins, and also proteins that are found in bacteria but not in mammals they could be potentially useful as therapeutic targets.
Are you planning on applying the techniques to other organisms?
Yes, but we need the resources to do it. I think one of the areas that we'll move into is this area of protein-protein interaction looking to determine the structures of binary protein complexes from E. coli genomes. There's quite a bit of data now on these proteins both from bioinformatics analysis of sequences as well as from actual concrete experimental results from literature. So that's one of the areas we'll go into.