An international consortium of researchers has established a repository of contaminants observed in affinity-purification mass spectrometry data with the aim of improving the quality of protein-protein interaction experiments.
Aptly – and, at least by the standards of scientific acronyms, hilariously – named the CRAPome, for the Contaminant Repository for Affinity Purification, the resource contains data from the negative controls of hundreds of AP-MS studies, aiding researchers' analyses by providing them access to significantly more control sets than they could generate on their own.
Commonly used for protein-protein interaction research, AP-MS consists of affinity purification of a target, or bait, protein followed by mass spec analysis to identify and quantifying proteins bound to the target.
In such experiments, however, distinguishing between true protein interactions and non-specific false positives can be a challenge. Typically, researchers approach this problem by running negative controls in which they perform the experiment without the bait protein to identify contaminant proteins that are picked up via various portions of the sample preparation or affinity purification process.
However, as Anne-Claude Gingras, a researcher at Toronto's Lunenfeld-Tanenbaum Research Institute and a leader of the CRAPome effort, told ProteoMonitor, the limited number of negative controls typically run in small-scale AP-MS studies can fail to identify the full complement of contaminants present in a PPI dataset.
In part, this is due to the stochastic nature of mass spectrometry, which means that each mass spec run will measure a somewhat different collection of proteins. This, Gingras noted, is particularly an issue in the case of low-abundance contaminants, which can often be missed in studies using a small number of controls.
Adding to the challenge is the fact that the contaminants present can be highly specific to the workflow used, meaning that small experimental variations can lead to changes in the non-specific interactors identified.
The CRAPome, which Gringas and her colleagues presented in a paper published this week in Nature Methods, aims to provide a large collection of negative control data specific to particular experimental conditions, allowing researchers to improve AP-MS contaminant analysis.
Hosted by the lab of University of Michigan researcher Alexey Nesvizhskii, co-author on the study, and supported by funding from the National Institutes of Health, the repository currently hosts data from nearly 400 AP-MS experiments. Gingras said that the resource is the first "endeavor to collect data across multiple labs and protocols into a useable system" that she is aware of.
The database allows researchers to search for suitable negative control sets based on key experimental variables determining AP-MS contaminants – factors "including the cell line used, the type of lysis conditions, the epitope tag and affinity reagent, and the support matrix," Gingras said, noting that these "are in our experience the main variables, and we have organized the database around [them]."
In addition, the repository contains annotations covering aspects of AP-MS protocols outside of these main categories, for example, the amount of salt and detergent in lysis and wash buffers and the number of washes performed.
"We annotate protocols within the database as extensively as possible so that the user of the CRAPome can both have access to these protocols to generate their own samples, or select the protocols which are most adapted for their study," Gingras said.
Beyond introducing the resource, the authors performed an analysis of the protein families most frequently detected as contaminants, identifying among these heat shock proteins, keratins, tubulins, actins, elongation factors, histonses, ribonucleoproteins, and ribosomal proteins.
Gingras noted that while this was the first such systematic analysis she was aware of, several of these protein classes have been widely regarding as contaminants by AP-MS researchers. "In fact," she noted, "some research groups specifically remove proteins including ribosomal proteins, tubulins or chaperones from their reported interaction data."
"We contend, however, that doing so arbitrarily is a bad practice," Gingras added. "Firstly, these proteins may be true interactors for a given bait protein, and should probably be reported if the quantitative data shows enrichment. Secondly, this also introduces bias in the data analysis, particularly when following the AP-MS analysis with functional analysis."
Proteins from these suspect classes have likely been underreported – not overreported – in the interaction literature, Gingras said, noting that the CRAPome resource could help counter this problem.
The repository allows researchers to score their interaction data via three computations methods: the commonly used SAINT algorithm, which uses statistical modeling of bait-prey spectral counts to determine the probability that an interaction is true; and two fold change-based scoring methods – FC-A, which averages spectral counts across all controls and FC-B, which averages only the three highest spectral counts.
This combination of scoring techniques is not typically used in AP-MS work, Gingras said, but, she noted, it can offer increased confidence in interaction analysis. In particular, the addition of the FC-B score improves detection of contaminants that are generally present in small amounts but can spike in certain controls.
As the authors noted, such contaminants are typically "diluted out" when multiple experiments are used for fold change or SAINT analysis. However, because the FC-B score uses only the three highest spectral counts observed, it can pick up spikes in contaminants that might go undetected by the other methods.
"The use of the [FC-B] in association with either the standard FC-A or SAINT enables [researchers] to view quickly possible issues with proteins with spurious behavior," Gingras said. "When both scores are concordant, the confidence that an interaction is real is increased. Discordant data should be analyzed more carefully."
CRAPome, she noted, allows researchers to apply this combination of analyses to their own control data whether or not they are using control data from the repository in their work.
"In this case, they benefit from the easy interface and visual displays without having to install any new programs on their computer," she said.