A team led by researchers at the University of California, San Francisco, has developed a resource for making functional predictions for protein post-translational modifications.
Detailed in a paper published in the July 20 edition of Cell, the resource is intended as a tool for guiding PTM research, helping scientists prioritize modifications for further investigation amidst the dramatic increase in PTM data, Pedro Beltrao, a UCSF researcher and author on the paper, told ProteoMonitor.
"Throughput has increased dramatically over the last five years or so, mostly because of new instruments like the [Thermo Fisher Scientific] Orbitrap, but also because of the enrichment methods that are now available," Beltrao said. "So, basically, we have these thousands and thousands of [PTM] sites, but there's no clear way of doing the functional analysis afterwards."
Adding to the need to prioritize PTMs for further inquiry is the fact that only a small portion of these modifications are likely to have important biological roles.
To attack this problem, the researchers performed a large-scale analysis of protein PTM data, developing methods to predict modifications involved in processes including cross-regulation, regulation of protein domain activity, and mediation of protein-protein interactions.
In the study, the UCSF team compiled roughly 200,000 phosphorylation, acetylation, and ubiquitination sites across 11 eukaryotic species. Through an analysis of the phosphosite data, the team demonstrated that phosphosite abundance and functionality are correlated with higher-than-average levels of evolutionary conservation. Based on this finding, as well as observations from previous studies, they developed a series of functional classifiers for protein modifications, using conservation as a measure of functional relevance.
Specifically, they identified as more likely to be functionally relevant PTM sites near other PTMs and PTM sites located at protein interface sites involved in protein-protein interactions. They also found that PTM conservation within protein domain families could predict regulatory "hot spots" linked to functionally important protein regions.
They then sought to experimentally validate these findings, using their classification systems to predict the role of particular modifications in protein-protein interactions as well as to identify regulatory phosphosites in a yeast heat shock protein.
In the former case, the researchers predicted via their analysis that a phosphosite at position S162 in the yeast protein Skp1 regulated that protein's interaction with the protein Met30. They followed this prediction with an in vivo protein complementation assay testing the strength of that interaction, finding that phosphorylation of the S162 weakens Skp1-Met30 interaction.
In their heat shock protein investigation, the researchers used their analysis methods to identify two "hot spots" in the HSP70 protein that were significantly enriched for phosphosites.
Both hot spots, they found, mapped to functionally important regions of the protein — one, the authors noted, "near the nucleotide binding pocket" and the second "near the entrance to the peptide binding groove." They then used their identification of these regions to predict regulatory phosphosites in the yeast Hsp70 SSA1 and tested the functional relevance of these predicted sites by generating yeast strains with mutations at these phosphorylation sites, finding these mutations affected a variety of Hsp70-dependent functions.
Citing these findings as an example of how their resource might help direct PTM research, the authors noted that the results "open the way to future studies dissecting the precise contribution of regulation of each region to overall Hsp70 function as well as the modulation of its activity under various growth conditions."
Researchers can search the database "by protein, by species," Beltrao said. "So they can go and find PTMs that have been previously described and what they are likely to be doing for [a given] protein. If [the modification] is in an interface region it will have the position; if it is in a domain region that we think is regulatory, that is there. Also if it is known to be regulated in previous experiments and if it is conserved [across species], that is all there."
Moving forward, the researchers plan to continue adding data to the resource, Beltrao said, noting that since submitting the Cell paper they have added additional species and PTM data.
More challenging than compiling this data, he said, was coming up with new functional classifiers by which researchers might target their studies.
"Broad classes, in the sense of: What is the mechanism by which these PTMs exert their function?" Beltrao said. "In the sense that you can say, for instance, that [PTMs] at an interface [region] are likely to regulate [protein-protein] interactions."
"In the same way, we have to come up with additional broad classifiers," he said. "For example, PTMs that overlap with localization signals. If you can predict very well localization signals, then you can say, 'If I modify this sequence in the protein, it's likely to change the protein localization.'"
Coming up with such classifiers "is the hardest challenge," he said. "It requires the whole cell biology field thinking about these things."
Beltrao cited as providing a potential example of such a classifier another proteomic study in the same issue of Cell in which a team at the Scripps Research Institute performed a global study of interactions between caspase proteases and protein phosphorylation (PM 7/27/2012).
"That's a cool example where they found phosphorylations near protease cleavage sites ... so you could create a classifier that says, "OK, if I have this PTM next to a cleavage site, I can regulate that cleavage," he said.
Beltrao recently accepted a position as a group leader at the European Bioinformatics Institute in Cambridge, UK, where, he said, one of his primary projects will be to continue building the PTM resource.