Like a locomotive pulling into the station to pick up its passengers, in late July the Department of Energy's Genomes to Life project rolled into the proteomics world with $103 million worth of steam. Those to land a first class seat include three groups of researchers from both government and academic labs who plan to use large scale proteome analysis techniques. The focus of their pursuits: the machinations of microorganisms associated with the DOE’s goals of clean energy production and environmental remediation.
Although the proteomics-related projects target different organisms (with already sequenced genomes), all three are joined by an emphasis on the study of protein complexes. With $57.5 million in funding, spread over either three or five years, the research teams plan to investigate new ways of tagging, isolating, and identifying protein complexes, and to find ways to archive the data in the public domain. In addition, the groups plan to develop bioinformatics tools to allow researchers to query the assembled data to help answer other biological questions.
In one program, led by Michelle Buchanan, the director of the chemical science division of Oak Ridge National Laboratory, researchers at ORNL, the University of Utah, the University of North Carolina, Chapel Hill, Argonne National Laboratory, and Pacific Northwest National Laboratory are teaming up to study all the protein complexes at work in Shewanella oneidensis and Rhodopseudomonas palustris, two microorganisms important for understanding bioremediation and energy cycles.
With their $23.4 million slice of the pie, to be disbursed over three years, Buchanan’s team plans to develop new techniques for isolating and identifying protein complexes that will address the inconsistencies inherent in many current techniques, such as those featured in analyses of the yeast proteome published early this year in Nature. Specifically, Buchanan and her team plan to focus on automating the in vitro expression of tagged proteins for use in pulling out complexes from the two microorganisms under investigation, and to use the tagged proteins to generate single chain antibodies as an alternative method for extracting protein complexes.
Buchanan’s team is also planning to devote resources toward developing lab-on-a-chip technologies for separating protein complexes pulled from cell lysates, and using the chips as interfaces with mass spectrometry. The group will draw on mass spectrometry resources at both ORNL and PNNL to analyze both intact proteins and tryptic peptides using ion trap, Q-TOF, and Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometry. In addition to creating databases of their interaction data, the researchers plan to investigate various approaches for imaging protein complexes in vivo, including techniques based on near field optics, nonlinear optics, and multiphoton excitation.
Buchanan stressed that a primary goal of the program is to create an infrastructure for technol-ogy development operating in tandem with the research on the microorganisms. “We want to assess the techniques out there now, but also develop new affinity agents, and stabilize complexes so they don’t dissociate before we can observe them,“ she said. “There’s also still work to be done on analysis [by mass spectrometry], including improving the dynamic range [for measuring the masses] of proteins in a sample.“
Crosslinking Peptides on a Grand Scale
As principal investigator for another Genomes to Life project, George Church, a professor of genetics at Harvard Medical School, shares an equally strong interest in rooting out protein complexes from the cells of the organisms. Together with researchers from the MIT, Brigham and Women’s Hospital, and Massachusetts General Hospital, Church’s group will study three microorganisms, Prochlorococcus, Caulobacter, and Pseudomonas, relevant to carbon sequestration, bioremediation, and general metabolic action, respectively.
But rather than focus on high-end mass spectrometry techniques like FT-ICR, Church’s team is relying on more run-of-the-mill mass spectrometry instrumentation, consisting primarily of ion traps for tandem mass spectrometry analysis of peptides. His project will have access to five Thermo Finnigan ion trap mass spectrometers linked to 75 Linux CPU nodes.
Church’s first application of this setup will be to quantify protein expression under various cellular conditions. To do this, his group will employ various strategies, including ICAT reagents, absolute quantitation techniques developed at Harvard Medical School by Steven Gygi, and algorithms for mass spectral analysis developed in Church’s laboratory.
The group’s plan for isolating and characterizing protein complexes is equally varied. To study DNA-binding proteins en masse, Church’s group plans to rely on a technique that uses formaldehyde to crosslink 12 of the 20 amino acid side chains, and the peptide backbone for all 20. The researchers also aim to use affinity-tagged cross-linking reagents such as Sulfo-SBED from Pierce to trap protein-protein interactions in vivo. “We’re planning to do cross-linking on a scale not demonstrated before,“ Church said.
Who has the Courage to Share Their Raw Data?
What all the projects have in common, however, is the desire to build accessible databases of the information on proteins they acquire. Doing this will require significant investments in bioinformatics, as Buchanan hopes to build a web-based system for releasing the data collected under her supervision, as well as a change in researchers’ approach to handling the raw data from their experiments.
“It’s important for us all to share proteomics data,“ said Church, “but it’s a scary thing to put your [raw] data out there. It’s much easier to collect than to analyze in a small amount of time.“ Releasing proteomics data under the same model as genomic data acquired by the Human Genome Project, which had to be released in a week, makes researchers anxious they won’t have time to make conclusions they can publish, Church added.
“I’m not sure how it will all play out,“ he said. “I helped establish the first RNA database, and now we’ll have to do the same for proteomics data. [But] we’re not scared.“