Laboratory of Neurotoxicology, National Institute of Mental Health
At A Glance
Name: Jeffrey Kowalak
Position: Staff scientist, Laboratory of Neurotoxicology, National Institute of Mental Health, since 1998. Director of laboratory's mass spectrometry facility; Chair of Association of Biomolecular Resource Facilities' Proteomics Standards Research Group, since 2005.
Background: Senior staff fellow, Section of Mass Spectrometry and Metabolic Analysis, National Institute of Child Health and Human Development, 1997-1998.
Senior fellow, department of biochemistry, University of Washington, 1994-1996.
PhD in biochemistry, University of Utah, 1994.
BS in biochemistry, University of Wyoming, 1987.
The ABRF's newly formed Proteomics Standards Research Group is gearing up for a new study that focuses on how well laboratories can identify proteins in a standard mixture (see ProteoMonitor 9/9/2005). This week, ProteoMonitor talked to Jeff Kowalak, the chair of the new sPRG to find out more about his background, his current research, and his inspiration for the new ABRF study.
What is your background in terms of proteomics?
I was trained in Jim McCloskey's lab — Jim McCloskey is a world authority in post-transcriptional modifications of nucleic acids, specifically transfer-RNAs and ribosomal-RNAs. So when I was in graduate school — I started in the summer of 1987 — in the spring of 1988, both electrospray and MALDI ionization became commercialized. And those two ionization modes made it possible to volatilize large, thermo-labile biomolecules to get them into the gas phase. So that really brought revolutionary changes to protein mass spectrometry, but it also had the same impact on nucleic acid biochemistry. So my thesis project was to develop a methodology to map post-transcriptional modifications in very large rRNAs. The strategy is very similar to proteomics — take the large piece of nucleic acid, use sequence-specific endonucleases to cut it up, and then use mass spectrometry to determine the molecular weights of the oligonucleotides.
I graduated from there in 1994 and wanted to expand my horizons in biomolecules, so I went and did a postdoc at the University of Washington with Ken Walsh. While I was there, having a background in nucleic acids and specifically working with bacterial ribosomes, I started doing MALDI. I wanted to investigate MALDI analysis of mixtures of proteins. Because I had a background with ribosomes, I thought that the 30S ribosomal subunit would be a great place to start because it's very well described, 22 proteins. And the E. coli ribosome had been extensively studied since the 1970s, so I thought, 'This has to be a very well-defined mixture of proteins.'
I got started, and by MALDI, many of the proteins had molecular weights that agreed with the predicted gene sequence, but one of them didn't, and it was off by 46 daltons, which is an unusual number to be off. And I started looking into the literature, and it turned out that it was E. coli ribosomal protein S12 — researchers in a laboratory at the Max Planck Institute in Germany had studied this protein for 10 years. And they knew it had an unusual post-translational modification at position 88, but they couldn't describe what it was. So I used some modern mass-spec techniques and some microscale derivitization to show it was a modified aspartic acid. That was described in a protein science paper in 1996.
Then on a personal note, my wife is from the DC area — she was born and raised in Alexandria, Va. She's an MD, the same training that my father-in-law was. And he had a health problem in the mid-90s and had to retire, so my wife took over his practice. I just lined up a second post doc at the NIH here in Sandy Markey's lab in mental health. I was in Dr. Markey's lab for a year, and then I was recruited into child health as a staff scientist. I stayed in child health for a year, and then was recruited back into Sandy Markey's lab again in the same capacity, and I've been here ever since Christmas of 1998.
What were you working on when you first got to the NIH?
When I first got here, I started to work on electrospray ionization on a very old FT-MS instrument that turned out to be more problematic than it was potential. So that wasn't really a productive time.
In 1999, I rejoined Dr. Markey's laboratory to establish a proteomics effort in this institute.
So now we do a variety of everything. We had MALDI technology at one time, and have principally shifted to almost exclusively LC-MS. And that's because the advantages that accrue with liquid chromatography are just very difficult to beat.
Also in my time in Sandy Markey's lab, we have developed automated 1- and 2-D nanoflow HPLC systems. One aspect of that work is about five years we established a Cooperative Research and Development Agreement, or CRADA, with Shimadzu. In a nutshell, Shimadzu has stationed an engineer in our lab, and we come up with novel HPLC ideas, and working together with Shimadzu, we turn them into viable products.
Did they turn your nanoflow HPLC into a product?
Yes. It's a 2D HPLC system that you can actually buy from Shimadzu now.
Any other products?
The 2D HPLC system is the big one. Right now also the newest push in proteomics is towards quantitation. And one problem with the mass spec community is they perceive every problem as a mass-spec problem. While mass spectrometry is providing lots of information and lots of solutions towards the quantitation problem, we actually backed up and took an approach of doing a pre-column fluorescence derivitization. In a continuing cooperative agreement with Shimadzu and the bioengineering resources here at the NIH, we have built a custom laser bench and are doing capillary laser fluorescence detection. With the label-induced fluorescent method, peptides are labeled with a primary amine-reactive fluorophore. The peptides are separated on a column, and as they elute, there's a laser that does the fluorescence activation. We excite and measure the emission, record that data on a computer. Since we know the molar absorptivity of the fluorophore, we can rearrange the Bier's law equation and derive the concentration.
Shimadzu will ultimately commercialize the technology, but I don't know when.
What other technologies are you developing?
Several years ago, we recognized that biopharma and those types of organizations have enough resources to develop their own IP solutions, but once those products get made, they never seem to make it into the public domain. As government scientists, all of our work is in the public domain, so we decided that we would start to make bioinformatics tools that would allow us to handle the very large data outflows that are common with proteomics experiments.
We developed a software package called DBParser. It's a tool that takes output from Mascot searches, pumps that information into a relational database, then we have a series of report generation scripts that allow you to extract and examine the data in various forms. For example, one of the forms is a parsimony analysis. If you take the average output from a Sequest report or a Mascot report, in my view that represents the maximum number of proteins that can account for the peptide data. Because the peptides that identify one protein often will identify many other closely related proteins. What we asked was can we come up with some rules that will give us a list of proteins that are the minimum list of proteins that account for all of the peptide data observed in any given proteomics experiment.
We did that, and our DBParser software as it was published in 2004 has parsimony routine analyses that try to give a concise list of proteins that account for all of the peptides observed in any given experiment. And we continue to expand that idea, and are now working on an integrated bioinformatics workbench that not only parses the output of Mascot search engines, but also works on OMSSA — Open Mass Spectrometry Search Algorithm — a new sequence library search tool that was developed by Lewis Geer in NCBI. So our workbench will take the output from Mascot, OMSSA or X!Tandem, parse those search outputs into a relational database, and allow you to do report generations. Essentially the idea there is one of metasearching. If multiple search engines come up with the same solution, we take that as kind of passive evidence of independent validation.
We also have an iterative scheme or way of tracking down post-translational modifications that's built into that workflow. Ultimately, we use a de novo strategy, because we know that given any dataset, if we run it through a sequence library search engine, about half of the spectra that remain after the search are actually very high-quality spectra that for one reason or another weren't assigned. So we do this iterative search to sequentially look for various post-translational modifications, and then after that process is done, we use a spectral quality filter to tell us how many of the remaining unassigned spectra are still very high quality and merit further investigation. Once we have those sorted, we use a de novo tool to either get a candidate sequence or a sequence tag. Then we do an automated Blast search to try to find that sequence or related sequence fragments so we can ultimately try to figure out what is going on — why there's a high-quality mass spectrum with no apparent sequence solution. Very often we find that some sort of chemical PTM has occurred. Quite often deamidations or N-terminal cyclizations.
We are doing these types of searches so that we can ultimately define rules that will help us search more intelligently. We don't want to have to go through these things manually. We want to find rules that define natural biological modifications and chemical post-translational modifications that happen routinely, and incorporate those rules into intelligent sequence library searching so that we can ultimately extract the maximum sequence information out of the datasets that we have.
How much extra information do you think you can get with these 'more intelligent' searches?
Oh, I think easily another 50 percent. We're finding quite a bit of extra information in the efforts we have put forth so far.
Do you plan on releasing this new software on the web?
What led you lead up the new Protein Standards Research Group study by the ABRF?
Last year, the NIH, NIST, and Institute for Systems Biology sponsored a two-day workshop on gas phase fragmentation and algorithms to interpret fragmentation data. And in the course of that workshop, it became evident that we really had a chicken-and-egg problem. That is, there is no effective way to evaluate independent sequence library search algorithms without having a dataset that is highly defined. And the only way to generate a highly defined dataset is to have a highly defined sample. And there are no biological samples that you can purify, and once you do the constituents, the sample is highly well defined. So we decided to make one.
I was already a member of the ABRF's Proteomics Research Group, and the ABRF is a very forward-looking organization. And they realized that the field of proteomics has many sub-disciplines. And one of those immediate disciplines is the focus on standards for proteomics. So I wrote a letter to the ABRF executive board explaining to them the need for this type of expansion of the PRG, and through my interactions with Bill Lane at Harvard, who was our executive board liaison to the PRG, he and I were basically the drivers for the Proteomics Standards Research Group, or sPRG.
We formed that group in early 2005, and we have joined with a corporate sponsor, who I can't disclose the name of, to produce a 50-protein mixture. We are nearing completion of that mixture, and will release it as a study sample. We are announcing the study and taking requests for samples now.
There will also be a Quantitative Proteomics Research Group forming in the near future, and their designation will be qPRG. Depending on the outcome of the study that Chris Turck and my colleagues on the PRG are conducting, there's a high likelihood that there'll be a quantitative research group formed in the near future.
What are your plans for future research?
I'm very happy where I'm at. While I work for Sandy, I have a lot of autonomy as to what it is that I do. In addition to all my other hats, I am responsible for running a proteomics facility within the laboratory of neurotoxicology. So I manage the daily flow of samples from postdocs and visiting scientists that are working in the laboratory of neurotoxicology. I'm also responsible for coordinating collaborations with other labs within our institute and other institutes on the NIH campus.
Our collaboration with Shimadzu is now five years old, and the particular person we've been with is approaching the end of his visa stay, so the future of that CRADA agreement is not clear at this point in time.
With respect to our bioinformatics interest, we've done pretty well in that vein and have gone so far as to hire a staff-level programmer, and we continue to attract postdocs who have an interest in bioinformatics.