This story originally ran on July 14 and has been updated to include additional comments.
By Tony Fong
That sense of déjà vu some proteomics researchers may be experiencing could be due to stress.
Researchers in the Netherlands have identified 44 protein biomarkers that they say appear repeatedly in 2D gel electrophoresis-based disease biomarker studies and have linked them to cellular stress response.
The results are based on a review of 66 published studies performed between 2000 and 2007, with an emphasis on MALDI-MS. In addition to the 44 recurring proteins, the study published in the June issue of Proteomics identified 28 protein families as appearing at a disproportionate rate in the literature.
The study mirrors similar findings published last year by researchers in the Czech Republic [see PM 05/22/08]. Of the 23 most frequently detected proteins in the earlier study, 17 were also found by the researchers of the current study.
But in addition to providing interlaboratory confirmation of the 2008 study, the current work builds upon it by providing a reason for the frequency of some proteins across different biomarker studies, Ping Wang, the first author on the Proteomics article and a post-doc at Maastricht University, told ProteoMonitor this week.
She and her co-authors stopped short of calling the 44 proteins the biomarker equivalent of fool's gold, but noted that "in future proteomic studies, more profound approaches should be applied to look beyond these proteins to find specific biomarkers."
For their work, they focused on the 2DE technology because of its widespread use as a separation technique in comparative protein-profiling research, and the MALDI platform because of their own biomarker studies utilizing the technology. They looked at only five species — worm, fruit flies, mouse, rat, and humans — because of the high-quality gene/protein databases available for them.
And while the study centered on the 2DE-MALDI workflow, Wang said that she suspects that a review of biomarker research using liquid chromatography-mass spec platforms would yield similar results, though the researchers are not pursing such work.
The motivation behind the study, Wang said, was that she and her colleagues kept seeing the same proteins in their disease biomarker work, as well as research they conducted for others at the university.
"We thought that this may be some general phenomenon, and we wanted to identify it and find out what was behind it," she said.
Scouring through the publications Proteomics, BBA-Proteins and Proteomics, Proteomics Science, Electrophoresis, and Molecular & Cellular Proteomics, she and her co-researchers did an unbiased selection of 66 studies performed in the five species, in vivo, ex vivo, or in vitro, on 20 different tissues and organs, and for 18 different experimental objectives.
An initial list of 1,931 protein entries was amassed, which was whittled to 1,872 because 59 of the proteins could not be further processed. Of that, the researchers uniquely identified 892 proteins.
The average detection frequency of any one protein found in a study was .032. Using as a cut-off three times that, the researchers found 44 proteins "frequently detected over all studies." While the proteins represent only 4.9 percent of all unique proteins, they account for 23.2 percent of all entries in the databases the authors used.
Because of the different experimental conditions of the source studies, different lists of frequently detected proteins were generated, so drawing a definite conclusion linking the frequently detected proteins with the type of organ or tissue, or the experimental objective, was not possible, Wang and her colleagues said.
However, they also said that there was "clear overlap" of proteins, and proteins such as enolase and heat shock proteins were frequently detected in different source experiments regardless of the conditions.
[ pagebreak ]
In addition to finding individual proteins that were detected disproportionately in the 66 studies, the Maastricht team also saw certain protein families over-represented across the studies. Using the HUGO Gene Nomenclature Committee and UniProtKB family annotation, and using a cut-off of three-times the average to define the frequently detected protein families, 28 over-represented families were detected, including the HSP70 family, intermediate filament family, and the annexin family.
While the 28 protein families comprise 7.3 percent of the total number of protein families, they cover 37.6 percent of the database entries.
After establishing the identities of the recurring proteins and their families, Wang and her colleagues set out to find out a possible reason they kept appearing in biomarker studies.
2DE is known to have limitations — such as an inability to detect low- and high-molecular weight proteins, an under-representation of hydrophobic proteins, and a low dynamic range and sensitivity — that could play a part in the recurrence of proteins in different studies.
But, according to Wang, "the more important part" explaining the frequency of certain proteins in biomarker studies is tied to the biological response of cells to stress.
When she and her colleagues performed a functional investigation and comparison, they found that many of the 44 proteins on their list of recurring proteins are also found on a list of 300 proteins that make up a minimal cellular stress proteome that had been proposed in other studies, available here and here.
The key aspects of the cellular stress response include DNA damage sensing and repair; redox regulation; cell cycle control; molecular chaperone function; protein degradation; fatty acids/lipid metabolism; and energy metabolism.
The frequently detected proteins have "alike functions" the researchers said, and taken as a whole, the finding "strongly suggests that these proteins were differentially expressed in the diverse experiments [they examined] due to the intrinsic cellular stress response," the Maastricht researchers wrote.
While MALDI was also suspected to play a role in their results, Wang and her colleagues found that the platform in actuality was unbiased. MALDI's ion yield is known to be governed by some physiochemical characteristics of peptides, such as arginine-containing peptides "that fly more easily in MALDI" and hydrophobic peptides that may be poorly ionized, the researchers wrote.
In their analysis of the amino-acid frequency of the 44 proteins, they did not find differences except for high lysine content, implying shorter peptides. However, MALDI tends to identify larger peptides.
"The amino acids that are reported to enhance the ion yield in MALDI-MS such as arginine, phenylalanine, and tyrosine, were all with lower frequency but not significantly," the authors said. "Together this indicates that MALDI-MS does not favor identification of these 44 proteins over the others."
In an e-mail, the corresponding author of the 2008 article, which found similar results, said that now that his study's findings have been confirmed, it will become more difficult for researchers to ignore the presence of these recurring proteins in their biomarker studies.
Jiri Petrak, senior researcher in pathophysiology at 1st Medical Faculty, Charles University in Prague, said the innovative aspect of the current study is that it shows "significant overlap in these 'general responders' even among species as distant as humans, worms, or fruit flies."
The most detected proteins, such as enolase, PDI, and HSP70, as well as other "suspicious" proteins should no longer be considered specific biomarkers for disease, he said, and added that the publication of the two articles signals "that proteomics is losing its childish self-fascination and [is starting to] critically reevaluate its methods and results."
Wang and her colleagues do not have any advice on how to improve the 2DE-MS workflow, and Wang said that because the over-representation of some proteins is a biological consequence, there may be nothing that can be done to the workflow to avoid it. She added that she expects LC-MS experiments to have similar results.
"We think that since this phenomenon is caused mainly by the cell stress response, very possibly when you use other methods, you still may find [that] these proteins will be frequently detected," Wang said. "I think it's very difficult to overcome this biological cellular response because you apply experimental conditions [and] the subject will [experience] stress."