NEW YORK (GenomeWeb) – A team led by scientists at the Max Planck Institute of Biochemistry has identified protein marker panels that could help researchers better assess the quality of plasma samples used in protein biomarker experiments.
The panels comprise sets of proteins found at high abundance in erythrocytes, platelets, and coagulated samples and can be used to determine whether a given plasma sample has been contaminated by any of these components or processes during patient sample collection and handling.
Described in a bioRxiv preprint published last month, the panels are intended as a resource researchers can use to assess whether proteins identified in biomarker studies as differentially expressed are potential markers or more likely artifacts of the sample handling process, said Philipp Geyer, first author on the study and a postdoctoral researcher in the group of Max Planck professor Matthias Mann, whose lab led the effort.
The study also demonstrates how extensive such contamination is in protein biomarker research, Geyer said, noting that in an examination of 200 previously published clinical studies, he and his colleagues found that roughly half of the studies reported as potential biomarkers one or more proteins from their quality control panels.
The project stemmed from the Mann group's observation that certain groups of proteins were consistently regulated together across different diseases they were studying, Geyer said. Looking at these proteins, they identified a subset of them as coming from erythrocytes, suggesting that their plasma samples were contaminated with red blood cells.
The source of some of the other proteins was less clear, and so to investigate that question the researchers collected blood samples from 10 men and 10 women and separated them into fractions consisting of erythrocytes, platelets, platelet-rich plasma, and platelet-free plasma. They analyzed these fractions using LC-MS/MS to create reference proteomes for each and then compared the previously identified differentially regulated proteins to these reference proteomes to determine their source, finding that they came from platelets.
They then set out to build panels of erythrocyte and platelet proteins that they could use to determine if a plasma sample was likely contaminated by these blood components. They selected the 30 highest abundance erythrocyte and platelet proteins that they were able to measure with coefficients of variation below 30 percent and which had at least 10-fold higher expression in erythrocytes and platelets than they did in plasma.
They also compared 72 plasma and 72 serum samples to identify proteins most altered due to coagulation.
It is unsurprising that plasma samples are in some cases contaminated by other blood components or by coagulation, Geyer said, noting that "everyone had in mind that if you handle a sample, of course, something can go wrong." But, he added, the study provides a concrete example of this phenomenon as well as a tool researchers can use both to evaluate the quality of their plasma samples and to rescue markers identified in contaminated samples.
Using the panel to evaluate their own samples, Geyer said he and his colleagues have found that between a third and a half of their samples are "significantly biased." To an extent, he said, this is inherent in the way clinical samples are collected.
"The best study in the world would be collected with one study nurse who was trained very well and always using the same protocols and the cases and controls would always come to your institute on the same day they took the samples," he said. "In real life, though, you often have a doctor who if they want to collect [samples] for a clinical study is taking samples during his daily routine and then on another day collecting a control cohort and so of course you can have differences between the two."
He cited the example of samples from a longitudinal study in which the Mann lab looked at the proteomics of patients who experienced sustained weight loss over a period of time. Going back to evaluated data from that study with using the contamination panels, the researchers found one timepoint where outliers from the platelet panel were prevalent. They consulted with their collaborators who had collected samples for the study and learned that at that time point the technician had used a different kind of collection tube than they had at the other time points.
Geyer said he envisioned several ways the panels might be used. Researchers could use them to test a subset of their plasma samples for the erythrocyte and platelet proteins at the beginning of a study, which could give an indication of whether the samples were collected and processed properly. The panels could also be useful for evaluating data from studies that have already been run, allowing researchers to weed out likely contaminants while retaining potentially real markers.
"If you see that some of them are markers in one of the panels panel, then it is clear that these proteins are probably not biomarkers, but maybe some others are biomarkers," he said. "The worst thing you could do is just toss away your whole study."
Geyer said that similar panels could be useful for studies using other sample types, such as urine or cerebrospinal fluid.
Ruedi Aebersold, a professor at the Swiss Federal Institute of Technology (ETH) Zurich who was not involved in the study, the Max Planck effort is valuable in that it more thoroughly documents the widely accepted notion that contamination and biases of different kinds make some proteins poorly suited as biomarkers.
"It is, of course, well known that this happens," he said, "but what they do is they document it with a lot of data, which is nice."
Aebersold compared the work to the Contaminant Repository for Affinity Purification, or CRAPome, database, which contains data from the negative controls of hundreds of affinity-purification mass spectrometry experiments, allowing researchers to look at commonly observed false-positives in protein-protein interaction experiments.
Aebersold suggested that, as a political matter, the researchers might gain more support for the panels they described by working through a larger proteomics consortium like the Human Proteome Organization.
"I think they are on the right track, but it will be more valuable in the long run to have community involvement, maybe even half a dozen labs who will discuss it and then put their name on it and write a short commentary, just to get more traction," he said.
Aebersold added that erythrocyte and platelet contamination are just two among the many sources of bias that biomarker experiments would ideally consider.
For instance, he said, natural genetic variability makes some proteins "terrible biomarkers because they are simply too variable in a normal population to distinguish between cases and controls." Other proteins may be poorly suited as biomarkers due to variations in how they behave during digestion and other parts of the sample prep process.
"There are a lot of factors and many of them are rather unexplored," Aebersold said, "which is too bad, because biomarker studies are exceedingly expensive and most [markers] fizzle out" once they are tested in larger cohorts.
"If one had this sort of backstop where you could say, 'Look, if this protein pops up, it is likely not to work out for these reasons,' that would save a lot of time and money," he said.
He added that the study also highlights the value of using large discovery cohorts.
"If the cohort is of a certain size and the experimental design is well structured… you can actually figure out [likely biases] directly from that cohort itself," he said.