Biognosys, a Zurich, Switzerland-based proteomics firm, has released an open-source software platform that analyzes data gathered from selected reaction monitoring mass spectrometry — a method for detecting and quantifying target proteins.
In a paper published last month in Nature Methods, Biognosys researchers and colleagues from the Institute of Molecular Systems Biology at the Swiss Federal Institute of Technology (ETH), Zurich, wrote that the software, called mProphet, uses "a probabilistic scoring model" to integrate "multiple dimensions of information available in SRM data ... for the automated, objective, flexible, and consistent scoring of SRM datasets."
Biognosys spun out of ETH Zurich in 2008. The firm focuses on providing proteomics-based contract research services, laboratory materials, and proprietary technology for mass spectrometry for pharma, diagnostics, and healthcare as well as biotechnology and agricultural companies.
The firm's revenues come from sales of customized protein assays and data analysis services. It also sells pre-made assays, but Rinner said the bulk of the business comes from clients whose projects require specialized assays designed for particular proteins.
While bioinformatics is a key component of its business, "we are not a software company" Oliver Rinner, Biognosys' president and chief scientific officer and one of the software developers, told BioInform.
"Our intent with this paper [was] to document our skills and knowledge in this field," he explained, adding that the firm plans to use mProphet for its in-house projects as well as for customers' data analysis needs.
He noted that because the market for software like mProphet is small, it didn't make sense to release it as a commercial package. Doing so would require that Biognosys reinvent itself as an informatics company as well as repackage the software in a form that can easily be installed and used on a desktop computer — factors that aren’t in keeping with its current business model.
Biognosys currently has 10 employees but is in the process of closing a round of funding for an undisclosed amount. Rinner said that once it does so, the company plans to hire additional staff, including a web developer and a lab technician.
Under the Hood: Decoy Transitions
SRM experiments are made up of two major steps. The first involves designing an assay that identifies the specific protein of interest. These assays record several transitions, each of which is described as a pair comprising a precursor ion of the target protein and a diagnostic fragment ion signal. This produces a dataset composed of true and false peaks that have to be analyzed.
At this point, researchers run smack into an analysis bottleneck, according to the authors of the Nature Methods paper, because there are more than 1,000 transitions that can be measured in a single run, which makes manually going through the results impractical.
"The mProphet tool will avoid this pitfall for targeted proteomics and make data generated in different laboratories directly comparable," said co-author Ruedi Aebersold, an ETH professor and one of mProphet's developers, noting that the software will “allow users to generate statistically validated data sets right from the outset.”
MProphet uses "a decoy-transition approach," which Rinner described as constructing transitions for proteins that aren’t present in the sample. To distinguish the true peak from these decoys and other "false" peaks, the method combines subscores for different features in the data structure, such as the shape of the peaks, into a single score that can be used to identify the peak that represents the right protein.
The process begins by splitting the dataset into training and test datasets. From the training dataset, true peaks, those with the highest subscores, and false or decoy peaks are used to learn the model iteratively.
They explain further that within each "transition group record" — defined as a record of all the possible transitions for a peptide — only one of the peaks can be true and it is ranked the highest and used in each iteration of the learning process.
The overall computed score of the highest-ranked peaks from each group is termed the mProphet score and is used to determine the number of false positives in the sample.
To ensure that mProphet worked as it should, the authors tested their software on SRM assays generated from a gold standard reference sample, which they created from 100 chemically synthesized peptides. They reported that the tools "separated true from false very well" and that the combined score showed the "best separation power" compared to results based on single scores.
They also tested the tool with a more complex sample containing 591 peptides that corresponded to 265 proteins. They reported that mProphet was able to identify 457 peptides and 238 proteins with high confidence.
The investigators claim that mProphet improves on current data analysis software for SRM. For example, the University of Washington's Skyline — which "uses the dot product of relative fragment ion intensities between a spectrum library and an SRM measurement to assess peak group quality" — requires a spectrum library, for one thing, and does not provide a false discovery rate. Another tool, the Broad Institute's AuDIT, provides scores that have "limited power for peptide identification," they wrote.
Lukas Reiter, a research and development associate at Biognosys and one of the authors on the paper, told BioInform via e-mail that comparisons between Skyline, AuDIT, and mProphet looked only at "sub-aspects" of the programs because they weren’t designed primarily for protein identification from SRM data.
Reiter explained that since Skyline relies on the correlation between fragment ions and spectrum libraries to score signals, the authors compared its score "in terms of discrimination power" to mProphet's subscores as well as the combined score.
"The result was that the intensity correlation score was not the most powerful among the full set of scores and that our combined score topped all other scores," he said.
For AuDIT, which was developed to detect transitions with interferences, Reiter and his colleagues looked at whether its scores could also be used to identify the true protein in the sample.
Their findings, he said, showed that when used these scores identify proteins that "have a low discrimination power" when compared to mProphet's scores.
He noted that AuDIT's scores can only be derived if the samples were analyzed in "technical replicates and if synthetic reference peptides were spiked into the sample."
Reiter added that during mProphet's development, the group paid special attention to its compatibility with the "most used labeling work flows and prior knowledge."
For instance, while mProphet does not rely on the availability of spectrum libraries, unlike Skyline, it can use the information if it is available. Furthermore, the tool can compute scores if reference peptides are spiked into the sample but it still works even if the reference information isn't available, he said.
MProphet is one of several open-source software modules available from Biognosys for protein analysis.
Other modules include mGen, which generates decoy transitions for a list of target transitions; mMap, which links the raw data in the mzXML format with MRM metadata; mQuest, which scores MRM data based on different experimental workflows; and mInteract, which coordinates mMap, mQuest, and mProphet and enables a one-command analysis.
The firm says its software is useful for large-scale projects in industrial, academic, and clinical research organizations for drug discovery, diagnostics, gene expression, crop science, and metabolic research, among others.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.