By Adrienne J. Burke
When the mass spectrometrist and two computer scientists who work in a protein analysis lab in the chemistry department at the University of Colorado, Boulder, got into shotgun proteomics last year, they did reconnaissance at one of the most high-throughput facilities of all — the Pacific Northwest National Laboratory. Never mind that the huge PNNL shop has a fleet of state-of-the-art mass specs and a Linux supercomputer dedicated to Sequest searches. The little Boulder team, with one dedicated mass spectrometer and a handful of computers, thinks big.
Mass spec technician Lauren Wolf says the group is in “pre-high-throughput,” but her colleague Alex Mendoza, a computational biochemist working to automate management of the data she generates, says he’s readying for a day when they’re dealing in hundreds of gigabytes of data.
With grants from NIH and the Howard Hughes Medical Institute, Mendoza, Wolf and computer scientist Karen Meyer-Arendt, under the direction of co-principal investigators Natalie Ahn and Katheryn Resing, are developing an Oracle database to support shotgun proteomics, their own protein database search programs, and direct spectral analysis methods for improving protein identification, as well as perfecting sample-prep methods for a technique they call “panning for proteins.”
The trouble with current shotgun methods, Wolf says, is that “you make a lot of data and you just put it into the tools that are out there, but you don’t necessarily get a clear picture of how good or bad your data is.” Protein-peptide matching algorithms, namely Mascot and Sequest, aren’t interoperable, sample prep methods vary, and data tracking is a nightmare. The CU team is addressing each of those issues in order to differentiate good from bad hits and to make more sense out of ambiguous ones. Says Meyer-Arendt, “We want to see what the ideal configuration is from the beginning to the end of the pipeline.”
Clearly, the group is making headway. Wolf says that their earliest attempt at putting a sample set through a shotgun experiment yielded just 13 proteins. “We knew there was so much more data than that there,” she says. “We knew we had to optimize both how we got our data and what we did with it.”
Within six months the sample prep, productivity, and analytical tools had been drastically improved to the point that a poster Wolf exhibited at ABRF in Denver this year described how the team had identified 2,213 unique human proteins. Not only did Wolf get great feedback from the ABRF community, she won an Amersham Biosciences award for the work.
Her poster, “Methodology Development of Human Shotgun Proteomics Study,” described how the group activated the MAP kinase pathway in human K562 erythroleukemia cells to monitor global molecular responses. Based on the MudPit protein separation methodology developed by John Yates at Scripps Research Institute, the group extracted proteins from the cells and separated them offline by strong cation exchange chromatography. Peptides from those SCX fractions were further separated by reverse-phase chromatography coupled to a Thermo Finnigan LCQ Classic ion trap mass spec.
Wolf says her ABRF peers were most impressed at the nitty gritty details of protein analysis that her group got down to. “We graphed all sorts of statistical computations. We looked at every sample prep method asking, ‘Is this better or is that better?’” PI Resing culls through some 60,000 lines of Excel sheets comparing experimental conditions every week.
“We try to look at many aspects of sample prep and data acquisition using our tools to assess how well we’re optimizing our samples,” Wolf says. “That’s one of the differences between 13 and 2,213 proteins.”
Ultimately, the group’s goal is to perfect the methodology in order to identify the highest possible number of proteins. It’s about “balancing high throughput with separation in a doable fashion,”’ Wolf says. “If other labs get into this, hopefully we will have plowed a small path in the snow for them.”