Several researchers are questioning the proteomics data behind a high-profile study that implicated co-infection by a fungus and a virus as the cause for recent honey bee colony collapses.
Last October, a team of scientists from the University of Montana and the US Army's Edgewood Chemical Biological Center published a study in the journal PLoS One suggesting that the colony collapses might be caused by co-infection by the fungus Nosema and invertebrate iridescent virus.
Using the Army's Agents of Biological Organs Identification, or ABOID, system – a mass spec-based proteomics platform for the detection of pathogenic microorganisms – the researchers identified Nosema and IIV peptides in bees from colonies that had suffered collapse, indicating that these two organisms might play a role in the disorder.
Potentially offering new insights into a phenomenon that has puzzled scientists since it was first observed in 2006, the study drew broad attention, including write-ups by the Associated Press, Discover, and the New York Times.
Since then, however, several outside scientists have called the group's findings into question, with some focusing in particular on the accuracy of their proteomic results.
Two separate commentaries – one by University of British Columbia researcher Leonard Foster published in the January edition of Molecular & Cellular Proteomics and another by University of California, San Francisco, researchers Giselle Knudsen and Robert Chalkley published in PLoS One in June – have raised concerns about the study, suggesting that the peptides identified by the authors as coming from Nosema and IIV are, in fact, better matches to honey bee proteins.
At root is the question of whether bee proteins were included in the database the paper's authors searched to make their peptide IDs. According to Knudsen and Chalkley's commentary, given the characteristics of the PeptideProphet algorithm used to confirm the peptide assignments, failure to include bee proteins in the database could lead to a significant number of incorrect identifications.
PeptideProphet uses a variety of metrics to score the reliability of a peptide identification. The software then divides the results into two distributions – scores of correctly matched spectra and incorrectly matched spectra, which it then deconvolves to determine the probability of an assignment being correct.
If, however, mass spectra are searched against an inappropriate database with no matching peptides, the algorithm will have no correct assignments on which to model its statistics and will model them on incorrect matches instead, potentially leading to incorrect IDs registering as correct.
Critics of the PLoS One colony collapse work have suggested that this phenomenon might have led to incorrect identification of bee peptides present in the study's samples as coming from Nosema and IIV.
"If you use [PeptideProphet with] a random database with no matches, you should ideally get nothing [assigned] a high probability, so it's possible to have outcomes where no correct peptide [matches] are found," Andrew Keller, a researcher at the Institute for Systems Biology and one of the inventors of the algorithm, told ProteoMonitor.
However, he noted, "you really can't use [the program] as a black box. You have to look and see what the model actually learned, to see how it partitioned the data to see if it looks reasonable."
One red flag in the colony collapse study, he said, is the high number of missed tryptic cleavages in the peptides the researchers identified as coming from Nosema and IIV. This pattern was first pointed out in the Foster commentary, which observed that more than 60 percent of the peptides the study assigned to IIV contained two or more missed tryptic cleavage sites.
Keller, who is not associated with either group of researchers, noted that this suggests potential problems with the colony collapse group's PeptideProphet analysis.
"On average, correct results should have fewer missed tryptic cleavages than incorrect results," he said. "With a semi-tryptic database search, it is common to see 80 percent of the inferred correct results are doubly tryptic versus only 7 percent of the inferred incorrect results."
The high number of missed cleavages could be due to incomplete trypsinization, he said, "but all the learned distributions together should still indicate a reasonable partitioning of the results."
"I suspect that wouldn't be the case with [the colony collapse researchers'] PeptideProphet analysis," he added.
The Foster commentary relied for its analysis on a list of the identified peptides provided by the study authors. For their critique, Knudsen and Chalkley obtained raw mass spec data from three of the bee samples used in the study, which they reanalyzed, finding that 74 of the 172 spectra previously matched to Nosema or IIV were better matches to honey bee proteins. The other 90 spectra returned no confident identification, they said.
According to University of Montana researcher Jerry Bromenshenk, lead author on the colony collapse paper and CEO of Bee Alert Technology, a bee management and health company, two of the three samples sent to Knudsen and Chalkley were healthy controls, which, he told ProteoMonitor, explains why they found no Nosema or IIV peptides.
He maintained, however, that the third sample contained both Nosema and IIV with the presence of Nosema having been confirmed via PCR and microscopy in addition to the group's proteomic data. Knudsen and Chalkley addressed this discrepancy in their commentary, noting that their inability to find evidence of Nosema peptides in the raw mass spec data "does not preclude evidence from other work, such as genomic sequencing efforts which do support the presence of Nosema in similar samples" but that "there is no evidence for the presence of iridovirus or Nosema peptides" in the raw data they reanalyzed "and that most if not all previously identified peptides can be explained as deriving from highly abundant honey bee proteins."
[ pagebreak ]
However, Charles Wick, former senior scientist at the Army's Edgewood Chemical Biological Center and leader of the colony collapse study's proteomics analysis, told ProteoMonitor that the discrepancy in the two groups' findings from the three raw files stems not from poor experimental design on the part of his team but on the greater accuracy and sensitivity of the Army's ABOID software platform compared to that used by Knudsen and Chalkley.
He agreed that searching too narrow a database for peptide matches could lead to incorrect assignments, but said that the researchers, in fact, did search against a broad database including bee, human, and other proteins.
University of Montana researcher Colin Henderson – also an author on the study – seconded this statement, telling ProteoMonitor that Wick's team searched a database derived from the entire National Center for Biotechnology Information's sequence database, identifying peptides belonging to bees, humans, plants, and "a whole host of other things."
These assertions contradict not only Foster's and Knudsen's and Chalkley's critiques of the study, but also the methods section of the study itself, which says that "the experimental MS/MS spectral data of bacterial peptides were searched using the Sequest ... algorithm against a constructed proteome database of microorganisms [emphasis added]."
Asked about this apparent contradiction, Wick, who has retired from his post at the ECBC since publication of the study, blamed it on miscommunication, calling it "a poorly written sentence."
"It could have been written better. It's one of those things you catch later," he said. "What we said there is a little complicated. It's OK to read it [as referring to a narrowly constructed database]. I'm just saying that's not what we did. It's a long paper and that's a short sentence."
Previous work using the Army's ABOID system – to which proteomics informatics firm Sage-N Research signed an exclusive license last month (PM 06/03/2011) – has used narrowly constructed databases containing only microorganisms. For instance, a December 2005 study in the Journal of Proteome Research, an April 2010 study in Applied and Environmental Microbiology, and a July 2010 study in the Journal of Proteome Research all used databases limited to either bacteria or microorganisms more broadly.
In addition to the proteomic analyses questioning the colony collapse data, a team of Columbia University and Pennsylvania State University researchers published a study in PLoS One last month in which they tried to detect IIV in collapsing bee colonies using PCR and a reanalysis of metagenomic data from earlier studies of collapsed colonies. Like the proteomic critiques, this work also found no evidence of IIV in bee samples, although, Wick noted, it used a relatively small sample size of 163 bees compared to more than 6,000 bees in the UM-Army study.
The Data Debate
Given the questions raised about the study's results, a number of researchers have requested access to the raw mass spec files the UM-Army team used to make its peptide identifications. Thus far, however, only the three files used by Knudsen and Chalkley in their analysis have been made available to the larger scientific community.
Requests for this data began shortly after the study was published in October, said PLoS One executive editor Damian Pattinson, noting that several researchers contacted him then to say that they were having difficulty obtaining raw mass spectrometry data from the study in order to confirm the results.
Pattinson "contacted the authors pretty quickly after publication to request" the data, he told ProteoMonitor. The authors, he said, told him that a US Army technical report on the paper was awaiting clearance and would provide the data in question once released. In the meantime they distributed the Excel document containing the identified Nosema and IIV peptides upon which Foster based his critique.
The Army report became available in December. However, Pattinson said, "judging by the comments on the [original] paper and the reanalyses, it doesn't sound like [the report] includes everything [that has been requested]. So I think we would still want the authors to share the data more fully as is our policy."
He said that he plans to contact the authors again shortly and ask them for the full sets of raw mass spec data.
The difficulty of obtaining this data may be due in part to something of a cultural misunderstanding, Pattinson suggested. Typically in fields like proteomics, a high priority is put on sharing of raw data, "while in an area like ecology, for example, there isn't really that culture of sharing [raw data] in that way," he said.
More generally, Pattinson noted, such disputes about proteomics data are likely to become more common as scientists outside the field increasingly adopt proteomics tools to further their research.
"It's a big topic for [PLoS One], and we're always looking at improving how we deal with data sharing," he said. "I think there are those sorts of [cultural] differences, and there's a long way to go to resolving those kinds of issues."
Indeed, when asked about releasing the raw mass spec data Bromenshenk questioned how the researchers could distribute a file that large, saying that "PLoS One has no place that could hold that big a file."
Pattinson noted that he was aware of this objection but said that the Tranche proteomics data repository, which is commonly used for storing such files, "should certainly be able to deal with it."
"If they prefer they can always either deposit it elsewhere or they can send it to us and we can send it out," he added.
Wick, however, questioned why the authors should release additional raw data given that, in his view, the researchers who reanalyzed it got their analysis wrong.
"The raw mass spec files that we shared with Chalkley for comparison purposes, they got wrong," he said. "I don't think the Army is going to be very interested in somebody whining about datasets."
"From our point of view, we found it, they need to go find it, and sorry for them if they don't have the tools," he said. "Sorry they can't see it. That's not my problem."
According to Bromenshenk, the authors are preparing a rebuttal to the critiques leveled against the study. Pattinson declined to comment on whether it was currently under review at PLoS One.
Have topics you'd like to see covered in ProteoMonitor? Contact the editor at abonislawski [at] genomeweb [.] com.