Researchers from the University of Dundee and the University of Manchester have completed a reanalysis of a large-scale phosphoproteomic dataset, identifying a number of adenosine diphosphate-ribosylation sites.
The research, which was published this week in the online edition of Nature Methods, offers insights into this little studied post-translational modification, Ivan Matic, a Dundee researcher and author on the paper, told ProteoMonitor. Significantly, it also provides an example of novel findings obtained via reanalysis of previously performed proteomic studies – demonstrating the potential utility of repositories of raw mass spectrometry data.
ADP-ribosylation is a PTM generated by ADP-ribose polymerases and ADP-ribose transferases, enzymes involved in a variety of cell functions including DNA repair and apoptosis. Matic began studying the modification roughly eight months ago, at which point he learned that mass spec analysis of ADP-ribosylation typically relies on use of titanium dioxide for enrichment – also a common enrichment procedure for phosphoproteomic studies.
This, he said, suggested that he might be able to identify protein ADP-ribosylation by investigating "some of the [raw data] files from large-scale phosphoproteomic studies."
Matic and his colleagues obtained the raw mass spec files from a phosphoproteomic study by Harvard University researcher Steven Gygi published in Cell in 2010 in which the scientists identified roughly 12,000 proteins and 36,000 phosphorylation sites from nine different mouse tissues. Reanalyzing those files, the Dundee and Manchester researchers identified 88 mono-ADP-ribosylation sites on 79 proteins, finding that the modification was more prevalent on arginine residues than previously thought and was prominently associated with tubulins and translation initiation factors.
Beyond the study's specific insights into ADP-ribosylation, Matic said, the work also offers a look at the potential benefits of more widely sharing raw mass spec data and the current challenges involved in doing so.
"The whole process of collecting [raw] files and analyzing them was more complicated than I expected," he said, noting that for the majority of large-scale studies he considered, the raw files were not available.
"Either they had not been submitted in the first instance to a repository, or they had been submitted but it was not possible to find them, or the files were there in the repository but I couldn't analyze them because they had been corrupted," Matic said. He added that he and his colleagues went through roughly 15 studies before they found the Gygi Cell study that they ultimately used in their reanalysis.
Matic downloaded the raw files for the study from Tranche, the proteomics community's primary repository for such data, but not without some hiccups. Led by University of Michigan researcher Phil Andrews, Tranche has been troubled in recent years by a lack of funding, which has forced the resource to operate at reduced capacity and led to difficulties in uploading and downloading files.
In Matic's case, he experienced difficulties downloading files from the repository as well as instances where he would manage to download raw files only to find that they were corrupted.
"The problem was that there was no mechanism to check if [given] raw files were actually OK or not," he said. "So basically if someone was transferring raw files to Tranche and something went wrong … it wasn't possible to find out" before downloading.
In one case, Matic tried to get around this issue by contacting a group directly for raw files, but, he said, "the whole thing became very complicated."
"A post-doc had originated the raw data, and she was not in the lab anymore. So then I tried to contact her, but she didn't reply," he said. "Maybe if I had persisted more, spent more time on it, but the whole process was getting so complicated that I lost interest."
By providing a central location for raw mass spec files, repositories like Tranche aim to streamline data sharing and allow researchers to avoid such complications. However, finding steady funding sources for such repositories has proven a challenge, leaving researchers and journal publishers to scramble for suitable places to store raw mass spec data (PM 9/30/2011).
In May, the European Bioinformatics Institute announced plans to provide storage for raw mass spec data as part of its Proteomics Identifications Database, PRIDE (PM 5/4/2012), a move that could help make up for Tranche's reduced capacities.
That resource, however, is still in the early stages with regard to raw data storage, Matic said. As of May, EBI had accepted two raw data submissions as an initial test of its system. According to Henning Hermjakob, EBI's team leader of proteomics services, the institute aims to have the system fully operational and accepting submissions from all comers by the end of the year.
Matic suggested that a major factor behind the difficulties of mass spec data repositories "is that the [proteomics] community is not convinced that collecting raw files and sharing raw files is necessarily something that is very useful.
"And the reason the community is not convinced is because so far there is [not much] evidence that this is useful," he said. He noted that while researchers have often reanalyzed mass spec data sets in order to compare different informatics platforms or evaluate new software, efforts to find new biological phenomenon in old data have been less common.
"If you can show that it is possible to find something completely new from a biological perspective – like an unexpected post-translational modification – I think the proteomics community can be convinced that [improving sharing of raw files] is something important," he said.
Reanalyzing the Gygi data, as opposed to generating it fresh, "saved me time and money," Matic said, adding that it was "also like an ethical issue" because it didn't require sacrificing additional mice.
Going forward, Matic said he plans to continue his work on ADP-ribosylation, moving next to look at the modification on glutamate residues, where it has been implicated in various forms of cancer. For this stage of the research he will generate his own data as well as continue to investigate data from previous proteomic studies, he said.