Skip to main content
Premium Trial:

Request an Annual Quote

Researchers Use Machine Learning to Map Functional Relevance of Human Phosphoproteome

NEW YORK – A team led by researchers at the European Molecular Biology Laboratory (EMBL) has analyzed the functional relevance of more than 100,000 human phosphosites.

In a study published today in Nature Biotechnology, the researchers collected phosphosites observed in 112 previously published mass spectrometry experiments run on 104 different human cell lines or tissues and then developed a machine learning tool for identifying phosphosites likely to have functional importance. 

They then validated several of their machine learning-based predictions experimentally. Additionally, they demonstrated that known deleterious genetic alterations were more likely to impact phosphosites scored as likely functional.

Due to its key role in cellular regulation, phosphorylation is among the most widely studied protein post-translational modifications, with mass spec experiments now able to identify as many as 50,000 phosphopeptides in a single experiment.

At the same time, phosphorylation is poorly conserved evolutionarily, indicating that many phosphosites may not have high functional significance.

"Therefore," the authors wrote, "prioritization strategies are crucial to facilitate the discovery of highly relevant phosphosites."

They noted that researchers have explored a range of techniques for prioritizing phosphosites for study, including "identifying phosphosites that are highly conserved, are located at interface positions, and show strong regulation."

The EMBL team addressed the problem via machine learning, identifying 119,809 human phosphosites, and collected data on each spanning 59 features. Using a set of 2,638 phosphosites confirmed to regulate protein function, they integrated these 59 features into a single score of functional importance ranging from 0 (lowest likelihood of functional importance) to 1 (highest likelihood of functional importance).

As a proof-of-principle, they investigated the role of the pS60 phosphosite on RAN-binding protein 1 (RANBP1), which their algorithm identified as the highest scoring phosphosite on that protein. They found that RANBP1 mutants showed a marked decrease in binding to the protein NEMP1, a known interaction partner of RANBP1.

They additionally identified a pair of high-scoring phosphosites on the protein SMARCC2 that they demonstrated in a mouse model likely plays a role in neuronal differentiation.

The study authors also looked at the interplay of genetic mutations and phosphosite functionality, finding that, as would be expected, "mututations mapping to phosphosites with a high functional score were more likely to be rare in human populations and pathogenic." On average, phosphosites impacted by pathogenic mutations had a functional significance score of .5, while those impacted by benign mutations averaged a score of .2.

While the scoring system doesn't perfectly capture the functional significance of the human phosphoproteome, it provides "a useful cutoff for an initial prioritization," the authors wrote. A cut-off score of .5 weeded out roughly 90 percent of the phosphoproteome while leaving around 50 percent of truly significant sites.