Skip to main content
Premium Trial:

Request an Annual Quote

Georgia Tech Team Develops Computational Tool to ID Biologically Significant Protein Hotspots


NEW YORK (GenomeWeb) – Researchers from Georgia Institute of Technology have developed an informatics tool that uses protein modification and three-dimensional structure data to predict and rank biologically significant post-translational modification (PTM) hotspots.

The so-called Structural Analysis of PTM Hotspots (SAPH-ire) predicts hotspots based on how many times the parts of the proteins in question have been found in a chemically modified state when they are taken out of a cell, Matthew Torres, an assistant professor in GT's school of biology and one of SAPH-ire's developers, explained in a statement. He further described it as a discovery tool that "will lead to a new understanding of how proteins are connected in cells." Deeper insights into those connections could help speed up the search for new drug targets as well as help researchers better understand the genetic mechanisms that lead to disease, according to the developers.

Torres and his colleagues published a paper in Molecular and Cellular Proteomics earlier this month that describes SAPH-ire in detail along with proof-of-concept tests using from data from several G protein families from multiple model organisms. They described SAPH-ire in the paper as "a quantitative ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data" to score and rank PTM hotspots "by their potential to impact biological function for distinct protein families." 

According to the developers, SAPH-ire helps bridge a gap between PTM detection and analysis of function. Developments in mass spectrometry technology — specifically liquid chromatography-mass spectrometry — have led to an exponential increase in the number and types of known biochemical modifications that occur in cells. Laboratories across the globe have also collected large quantities of metadata on modification sites in various protein families and the effects of these modifications on protein function. At the same time, technological improvements have also beefed up the number of experimentally determined 3D protein structures that are now publicly available through repositories like the Protein Data Bank.

These datasets provide fodder for rich research into protein behavior and how protein modifiers change 3D structures and affect function that a decade ago would not have been possible or would only have been possible in a limited context, Torres told GenomeWeb. However, the capacity to generate data far outstrips the research community's capacity to understand what the PTMs are doing, he said, including how they alter protein structure and what, if any, downstream effects such alterations trigger.

SAPH-ire utilizes data from experimentally verified protein modifications and 3D protein structures that have been culled from public resources such as the Protein Data Bank to prioritize important hotspots, which are parts of protein sequences with PTMs that are repeated in the same position in sequences from multiple members of the same protein family. It works by projecting input PTM hotspots onto the 3D protein structures, which allows the entire set of family-specific PTMs to be visualized on any protein structure that is representative for the family. Once projected there, SAPH-ire integrates multiple quantitative features from each hotspot to create a so-called PTM "Functional Potential Score," which uses weighted criteria to rank hotspots in order of highest to lowest potential for having significant biological function — the higher the score, the more likely it is that hotspot is significant.

The number of times a modification shows up within members of a protein family, or the observation frequency, is one metric used to grade hotspots but SAPH-ire also counts up how many times the changed amino acid in the hotspot sequence is actually compatible with the PTMs that occur at the site, such as the addition of phosphoryl group, Torres said.

The more frequently that PTM-friendly changes are observed in the hotspots, the more likely it is that the modification is crucial to protein function. SAPH-ire also takes into account whether the hotspot is exposed on the surface of the structure or if it's buried deeper in the folds and awards higher scores to residues that have greater surface accessibility as these are more likely to be regulated by the cell. It also looks at whether the hotspot occurs at a protein interface — the part of the protein that interacts with other proteins in the cell — as these are also likely to impact biological function.

Generally speaking, the state of the art has been to use one of the aforementioned approaches in isolation to try to identify important hotspots, according to Torres. "This paper demonstrates for the first time that integrating multiple factors … actually does a really great job of distinguishing PTM hotspots that have a known function," he said. "That’s where we've hopefully moved the field forward."

For the study, the researchers built a database of PTM data from eight unique G protein families gathered from 12 public databases. The list included a family of proteins called heterotrimeric G proteins, which play an important role in transmitting signals to cells about their environments and getting them to respond to the presence of hormones, neurotransmitters, and so on.

Because of their therapeutic relevance, heterotrimeric G proteins have been well–studied with lots of publicly available information about their 3D structures, and precise mechanisms for their activation as well as mechanisms they use to activate other proteins, according to Torres, who has himself spent time studying the proteins. The breadth of available knowledge made this family an ideal testbed for putting SAPH-ire through its paces, he said; however, the approach can work on any protein family with available PTM and structure data.

For the test, the researchers identified 1,728 experimentally verified PTMs which they used to identify 451 unique hotspots, 51 of which had demonstrated biological functionality. They then used SAPH-ire to analyze the hotspots to see if the method could quantitatively predict hotspots that would be biologically significant. They report that their approach "improves the prioritization ranking of PTM hotspots" compared to other ranking methods.

In addition to ranking known hotspots, SAPH-ire also awarded high marks to hotspots not previously known to be biologically significant. Further exploration of one of these hotspots led to the discovery of a new regulatory element involved in cell signaling within one of the heterotrimeric G protein families that, according to Torres, has been largely ignored "because it's pretty unimpressive from a purely structural viewpoint."

To validate the in silico prediction, the researchers turned to yeast cells and ran multiple experiments to assess the effects of biological stimuli on the hotspot and the effects of mutations in the amino acid sequences on the site. Their experiments confirmed SAPH-ire's predictions, showing that not only was the hotspot modified in response to stimuli but also that mutations in the amino acid sequence had an effect on the stability of the protein in the cell.

For their next steps, Torres and colleagues hope to work with other scientists to try out SAPH-ire on the protein families they study. The GT researchers aren't making SAPH-ire available for researchers to run, Torres said, but instead plan to provide a database of results that other protein scientists can query to help them identify and prioritize PTM hotspots. The team has already used the tool internally to analyze PTM hotspots in other protein families besides heterotrimeric G proteins and will make all of the information available for use through the database, he said. "I think we are just at the beginning of understanding what this can tell us."

The developers also hope to link up with commercial software vendors that sell interpretation solutions for interpreting mass spectrometry data, Torres said. Currently, these tools provide lots of information about PTM locations and their involvement in cellular pathways but virtually nothing on hotspots and protein structure, he told Genomeweb. He envisions partnering with these vendors to roll SAPH-ire's prediction and results into their systems.