Researchers from the European Molecular Biology Laboratory have established a human phosphatase database utilizing a novel structure and sequence-based classification scheme.
The database, named DEPOD, for human DEPhOsphorylation Database, contains information on 223 human phosphatases and their substrates and could prove useful as a resource for guiding and interpreting research into these proteins, Maja Köhn, an EMBL researcher and one of the leaders of the effort, told ProteoMonitor.
Phosphatases, along with kinases, control protein phosphoregulation, the former working to remove phosphate groups and the latter working to add them. Kinases have emerged as important drug targets and a key area of proteomics research, with a number of research groups and companies investigating and cataloguing kinase-substrate interactions.
Phosphatases, on the other hand, have been less thoroughly researched, with considerably fewer of their substrates identified. Given that phosphatases have traditionally been grouped based on their substrate specificity, this has made them difficult to classify. Classification has been made additionally challenging, in some instances, by the diversity of some phosphatases' substrates.
These issues, Köhn said, led her and her colleagues to adopt for their database work a new approach to phosphatase classification – one based on the structure and sequence of these proteins' catalytically active domains.
In work detailed in a paper published last week in Science Signaling, the EMBL team extracted 225 phosphatase domains and classified them based on their structure and sequence. Using this new classification method, the researchers restructured the 34 historically recognized classes of phosphatases into 18 families, matching 13 historical families to 13 of their new structure-based families and grouping the remaining 21 historical families into five of the new families.
This condensing of 21 historical families into five new families indicates a number of previously unidentified structural similarities between historical phosphatase classes, Köhn noted.
"We've found relationships that previously were not clear," she said, adding that she believes that the work would help suggest new potential substrates for some phosphatases.
For instance, in the case of phosphatases classified as tyrosine phosphatases, researchers "have typically looked at tyrosine substrates and, with a few exceptions, not really looked at other possible substrates," Köhn said. "So with our analysis and database, you can now look up your phosphatase of interest and see that it is not only the [previously predicted] protein substrates that are involved."
The database could prove helpful in generating hypotheses to pursue regarding particular phosphatases as well as in explaining unexpected results," she said.
The EMBL team's phosphatase reclassification effort, Köhn noted, parallels work done recently by University of Würzburg researchers who grouped haloacid dehalogenase phosphatases via a similar sequence- and structure-based classification system.
Both studies demonstrated "that structural classification can be very important," she said, noting that the University of Würzburg effort generated some "quite surprising results" regarding relationships between these phosphatases.
Köhn and her colleagues did not do any new experiments to generate phosphatase-substrate data for their study, relying on phosphatase-substrate relationships identified in previously published papers.
Via literature searches, the EMBL researchers linked 298 protein substrates and 89 non-protein substrates with 194 phosphatases, using this data to construct a human phosphatase-substrate network comprising 885 phosphatase-substrate connections.
As Köhn noted, a significant challenge to such work is determining whether or not observed interactions are, in fact, real. "One issue that always comes up in these discussions is whether or not a [proposed] substrate is valid," she said. " There are lots of suggested substrates and with phosphatases it is really difficult to determine yes or no."
To address this question, the researchers established a classification system for the phosphatase-substrate pairings included in DEPOD to give an idea of their confidence in the authenticity of individual interactions.
Interactions verified either by in vitro or in vivo assays in a single laboratory or a UniProt annotation received a score of one – the lowest confidence level. Interactions verified by either in vitro or in vivo assays in multiple laboratories or by both in vivo and in vitro assays in a single laboratory received a score of two. Interactions verified by both in vitro and in vivo assays in multiple laboratories received a score of three.
In vitro data can be particularly problematic in such work due to that fact that observed in vitro protein interactions are no guarantee that the proteins will interact or, indeed, even be localized together in vivo. Köhn acknowledged this difficulty, noting that this was why in vitro-only interactions received the lowest confidence score.
However, she noted, "if we had taken out all of the in vitro data, there would have been much less data" in DEPOD.
Köhn said that she and her colleagues will continue their literature searches to update the database with new phosphatase-substrate data. They also hope that other researchers in the field will add their own observations to the database.
"The hope is that we can make this a community platform for people interested in this area of research," she said.
She added that outside researchers might use the information in DEPOD for additional computational projects – for instance, predicting potential phosphatase substrates based on existing phosphatase-substrate data.
However, given the wide range of substrates for some phosphatases, such a project would likely prove significantly more challenging than similar efforts to predict kinase substrates from existing data, Köhn said.
"I think that for specialist phosphatases it might be possible to develop [predictive] computational methods based on relatively few parameters such as sequence specificity," she said. "But in general this is far more complex for phosphatases than for kinases, and I think that a computational prediction program would need to include other parameters like cellular localization and such."
That said, "now that the data is correlated, people who want to try to develop these kinds of methods can download it and use it for such purposes," she added.
Köhn's own work focuses on developing substrate-based inhibitors of phosphatases, which, she noted, was her inspiration for the DEPOD project.
"We would like to take [phosphatase] substrates and modify them to make them inhibitors, and for that reason we are very interested in compiling all the substrates," she said. "So, for us, this has been very useful."