Researchers in George Church's laboratory at Harvard Medical School have developed a computational method for predicting post-translational modifications that they said cannot be identified with current with mass spectrometry-based techniques.
Using their strategy, described in a research paper in the current issue of Molecular & Cellular Proteomics, the authors were able to identify novel phosphorylation and acetylation sites across several organisms.
They added that the strategy also proved that some PTMs occur across different species, something that has mostly been only assumed.
The strategy devised by lead author Daniel Schwartz and co-researchers uses an algorithm called motif-x to extract motifs from a sequence database. They used motif-x to determine phosphorylation motifs in yeast, fly, mouse, and man; and lysine acetylation motifs in man. They then scanned the motifs against proteomic sequence data with a new algorithm called scan-x to predict other potential modifications sites.
Scwhartz developed the motif-x method in 2005 while he was still doing his graduate work. Now a post-doc in Church's lab, Schwartz said the algorithm was originally devised as a way to take sequence data from all the studies that were coming out about phosphorylation sites and "getting potential motifs out … and to maybe understand the kinases that are active [and] to do this in automated fashion."
"It was going to extract out over-represented patterns from mass-spec experiments that had thousands upon thousands of phosphorylation sites and to do it in an automated fashion," Schwartz told ProteoMonitor this week.
Once that was achieved, the next step was to take the motifs and scan them against a proteome "to make additional sites that might be phosphorylated based on your initial dataset," Schwartz said. And that's where scan-x comes in: By locating and scoring motifs in a protein sequence, scan-x can predict PTMs.
For their MCP study, Schwartz and his colleagues concentrated on phosphorylation data, "but it certainly doesn't have to be limited to that," Schwartz said. "It can scan anything that goes into motif-x and extract out motifs."
Comparing their approach on phosphorylation prediction against two recently published tools, the author reported "substantial improvements" in both sensitivity and specificity.
NetPhosYeast is an artificial neural network-based serine and threonine phosphorylation predictor specifically for yeast. In their work using NetPhosYeast, the researchers achieved a sensitivity of 93.7 percent and a specificity of 39.2 percent.
Phosida is a tool aimed at human and mouse serine and threonine phosphorylation prediction using a support vector machine strategy. Using it against a dataset of human serine and threonine, Schwartz and his colleagues achieved a sensitivity of 12.2 percent and a specificity value of 97.3 percent. In the mouse dataset, sensitivity rose to 22.2 percent at a specificity of 97.1 percent.
By comparison, they reported a sensitivity of 23.3 percent and a specificity of 97.3 percent with their computational approach using motif-x and scan-x.
They also compared their method against Scansite, a program that uses position-specific scoring matrices "derived experimentally for individual kinases to make phosphorylation predictions," and is one of the most commonly used tools for phosphorylation predictions, the authors wrote.
Applied against the human serine and threonin phosphorylation test sets in their experiment, Scansite recorded a sensitivity of 13.2 percent at a specificity of 97.6 percent. Meanwhile, using their own method, Schwartz and his co-researchers achieved a sensitivity of 21.4 percent at "the equivalent specificity."
They then compared their human tyrosine phosphorylation predictions against the Scansite tyrosine kinase prediction tool at medium stringency: Scansite achieved sensitivity of 5.1 percent at a specificity of 97.5 percent. Scan-x yielded a 9.1-percent sensitivity at a 97.5-percent specificity.
[ pagebreak ]
However, they noted that the lower sensitivity in Scansite may be due to a lack of complete kinase-specific data, and not to the Scansite algorithm. "Several of the motifs discovered [in their research] do not correspond to those of any known kinase," they wrote in their paper.
In addition, Schwartz and his colleagues tested their approach against two studies that were published as they were preparing their MCP manuscript. The first was a study into phosphorylation sites "carried out under DNA damage conditions by treating cells with methyl methanesulfonate, an agent known to activate a number of damage-specific kinase pathways." The second was work looking at phosphorylation in Drosphila Kc167 cells.
Schwartz's team wrote that their approach was able to predict approximately 27 percent of the phosphorylation sites from the two comparison studies at a 95-percent specificity rate. Meantime, the sensitivity rates were a "modest decrease from the expected values of 37.7 percent for yeast and 31.8 percent for fly," but "given the unique nature of these new datasets, they … serve to highlight the robustness of the prediction procedure.
"In time as new protein modification studies are added to our training sets with a wide variety of experimental conditions, we expect that the discrepancy between our predicted sensitivity and actual sensitivity to approach zero," according to the MCP paper.
Research into PTMs have moved into the forefront of protein and proteomic work because of their suspected roles in disease, and because of newly developed high-throughput technology, most notably mass specs, which have led to an explosion of PTM data.
Yet despite this increase, the continuing rate at which new PTMs are identified "demonstrates the fact that our knowledge of all PTMs are not yet near the point of saturation," the authors said in their article.
In addition, most information about PTMs cover phosphorylation, and even though there are other types of modifications for which "large enzyme families are known … little substrate PTM data" for them exists, Schwartz's team wrote in their paper. Consequently, researchers currently need a computational approach for PTM identification.
To be sure, mass spectrometry has been successful in detecting PTMs, identifying more than 40,000 localized sites in the past five years alone. But "the current state of the art in mass spectrometry provides uneven sequence coverage of proteins because of systematic biases that are not completely understood, and sequence coverage typically varies widely between 20 and 40 percent," the authors of the MCP paper said.
Some researchers are currently studying ways of using mass specs to increase coverage, and new mass-spec methods, including electron-transfer dissociation, are allowing others to locate more PTM sites. And as mass specs continue to become more sensitive, the instruments will allow greater detection of such modifications, experts have said.
But, even with these advances in mass spectrometry, computational tools such as the one described in the MCP paper are needed in cases where the modifications cannot be directly seen, Schwartz said.
Using the strategy he and his colleagues developed, "we can make an educated guess as to where the phosphorylation sites may be happening in those proteins" that aren't covered by mass spec, said Schwartz.
And as the amount of mass-spec data increases, their approach should also lead to a jump in the number of modifications it can predict, he said. "That's something that's going to be important for us to do: to update our databases and grow with the community because data is being added at an unbelievable pace right now," Schwartz said.
[ pagebreak ]
In addition to phosphorylation, Schwartz's team used their method to computationally extract acetylation motifs for the first time. Based on an inspection of several motifs, there appears to be a preference for glycine and lysine in the residues immediately surrounding the acetylation residues at the +1 position, they wrote in their paper.
"These motifs may represent differences in acetyltransferase enzyme specificities," they said.
According to Schwartz, as far as motifs are concerned, "it seems like there are acetylation-specific motifs that are fairly ubiquitous."
While phosphorylation has been the most studied PTM, he also predicted that based on the number of predictions made by scan-x, acetylation could very soon rival phosphorylation as a PTM of interest.
The PhosphoSite has about 3,000 acetylation modifications, but using the motif-x and scan-x algorithms, and a specificity of 95 percent, there are an additional 54,000 predicted modifications. At 99-percent specificity an additional 18,000 modifications can be expected, Schwartz said.
The method should be effective for detecting any PTM "that we can get at by mass spec and [for which we] have a decent amount of data … assuming that there is some residue dependency surrounding the PTM," he added.
He said he has used the approach on datasets for PTMs such as glycosylation and SUMOylation, but didn't include the data in the MCP paper because the datasets weren't large enough. However, "the data is very similar" to what they found with phosphorylation and acetylation, "and everything seems to point to the fact that you're going to get very similar sensitivities and specificities," he added.
The strategy is also able to predict PTMs across different organisms. While the biological community has "assumed the motifs would be conserved" across organisms, it has never been proven," Schwartz said.
"There might be slight differences between various organisms as far as what their motif preferences are … so I think it is kind of important if you're going to carry out a prediction to make it organism-specific," he said.
In their paper, the authors said the predictive functionality of scan-x coupled with motif-x "will provide the necessary bridge between those who work on the proteomic scale and those who work on the protein scale."
Future research, they added, will focus on improving the predictive performance of the method and adding data from different modification sites.
"It is the concerted interaction of numerous protein modifications that likely contribute to a significant amount of phenotypic variability … and it is therefore our hope that protein modification prediction can also become a useful tool for interpreting diversity in human populations and in those other species," they said in the paper.