Proteomics researchers can soon expect a new add-on to Thermo’s BioWorks protein identification software from the National Cancer Institute that will help them sift through tandem mass spec results from hundreds of ion trap runs.
The software package, called Proteomic Toolkit for Protein Identification and Quantification, or ProToPIQ, was developed by Aaron Lucas, a bioinformaticist at SAIC Frederick, an NCI Frederick contractor.
A single experiment, Lucas said, can result in up to 500 mass-spec runs, each of which can give rise to a list of up to 3,000 peptide IDs after using TurboSequest, the search algorithm that comes with the BioWorks package. “There is a lot of redundancy in those exports [and] there is no quick and easy way to manipulate [them],” he said. BioWorks does not permit multiple peptide ID files to be combined easily, which makes analyzing and mining data from multiple experiments problematic.
ProToPIQ can pool peptide ID lists from multiple runs into a single Access file, so that users can perform manipulations on the data, such as eliminating redundant IDs or quantifying peptides, and export it as an Excel spreadsheet or other document type — something no commercial software to date offers, according to Lucas.
The program offers two main data-processing tools. “Global Proteome Survey” permits researchers to identify relevant peptides by filtering the data according to the digestion enzyme they used in the experiment. They can choose from seven options: fully or partially tryptic; fully or partially elastic; fully or partially chymotryptic; or no protease. The advantage of this feature, according to Lucas, is that “some enzymes are not as specific as others, [so] we feel that it is necessary to judge peptide identifications, especially from biofluids, with different Xcorr criteria.”
In addition, a quantitation module lets users import and analyze data from multiple ICAT- or O-18-labeling experiments and create user-defined tables of over- and underexpressed proteins. Thermo’s own quantitation tool, Xpress, can only work on an individual file.
ProToPIQ also allows users to retrieve protein names from protein database queries, rather than just accession numbers. BioWorks delivers protein names from some, but not all, protein databases, according to Lucas.
Interest in the software, which Lucas’ lab uses on a daily basis, has been “great”, he said, both from academic researchers and from pharmaceutical companies: “Almost every person who has seen it work has been very, very excited about it.”
However, although NCI formally offered ProToPIQ for licensing in July, potential users will have to be patient for a little longer. “I don’t want to throw the source code out until I [prepare] a paper or a tutorial on how to use it,” said Lucas, adding that he plans to write something up by early next year.
In addition, he has yet to update the software to recognize exports from BioWorks 3.2, an upcoming version of the software that supports data from Thermo’s new LTQ and LTQ-FT instruments. Lucas said he is also working on a version of the software that will work with other peptide mass spec search engines, such as Mascot.
Academic users will be able to license the software for free, while companies will be able to purchase a non-exclusive user license on a per-seat basis, according to an NCI licensing officer.