Skip to main content
Premium Trial:

Request an Annual Quote

Fred Hutchinson and NCI Launch Software For Sharing, Managing Proteomics Data


Fred Hutchinson and the National Cancer Institute released last week a free, open-source software platform that is intended to make it easier for researchers to manage and share data during the proteomic discovery process.

The software, called Computational Proteomics Analysis System, is a web-based system that is designed to complement existing repositories for proteomics data, such as PRIDE, Peptide Atlas, and GPM, according to a paper on CPAS that was published Dec. 8 in the online version of the Journal of Proteome Research.

"We're not out there to replace the current repositories," said Martin McIntosh, an associate member of the Fred Hutchinson Cancer Research Center who led the development of CPAS. "We're trying to do stuff that's available before [the data is published in repositories]. If researchers want to publish their raw findings on CPAS, they can."

Databases such as GPM and PeptideAtlas process data in a standard way, then integrate results, McIntosh explained, while PRIDE adds extensive annotations to protein findings. The data processing and annotation takes time, so it is inconvenient for researchers to use these repositories to share data while they are in the process of proteomic discovery, he said.

When researchers want to share data, they typically zip up the data into a big file and send it via file transfer protocol, McIntosh said. Because researchers have different data-analysis and data-management systems, it often takes a lot of human intervention to get data into another researcher's system, he added.

"We're not out there to replace the current repositories. We're trying to do stuff that's available before [the data is published in repositories]. If researchers want to publish their raw findings on CPAS, they can."

"It's not easy to share data," said McIntosh. "Some people use systems that are very idiosyncratic."

With CPAS, data is uploaded and downloaded in a standard format, and an analysis tool allows for users to analyze data using any search engine that is currently available. Another tool allows users to annotate data.

There is no quality control of data that is put onto CPAS, McIntosh said.

"People can evaluate the quality for themselves," said McIntosh. "It was not our responsibility to build a system that would impose notions of quality."

A "permissions" tool in the CPAS system allows researchers to control who they want to share data with, McIntosh added.

"Whether they choose to let the world see [the data], or only someone down the hall, is up to them," he said.

The CPAS project was initiated after the Fred Hutchinson Cancer Research Center received a two-year grant from the National Cancer Institute to build "infrastructure and resources" for clinical proteomics, McIntosh said. The first year of the NCI project has just recently been completed.

"As a requirement of our contract, we were required to develop any type of software tool to be made available in an open source, free format," he explained. "We decided the best way was to create a single, integrated platform for integrating tools and data analysis for proteomic mining."

The development of the FuGE data standards model (see ProteoMonitor 6/3/2005) was critical for the development of CPAS, McIntosh said. The model provides a standard data format for proteomics experiments, as well as other types of experiments, including genomics, transcriptomics, and metabolomics experiments.

In addition to implementing FuGE, CPAS also adopted the Institute of Systems Biology's pepXML — a data format to which output from any search engine can be transformed.

"We went out there and saw all the tools for proteomics, and we wanted to integrate the components together," said McIntosh. "We were able to build something that has the components researchers need: an analysis tool, an experimental-annotation tool, and a data-mining tool."

Version 1.0 of CPAS was made available only to researchers at the Fred Hutchinson center. So far, there are about 7,500 tandem mass spectrometry experiments stored in the center's CPAS system, McIntosh said.

About six months ago CPAS was also released to a few external users who agreed to evaluate the software, including David State's research group at the University of Michigan, and software companies Genologics and LabKey.

McIntosh said version 1.1 of CPAS was released to the public last week because "it had a certain level of maturity."

So far, there are about 20 different sites where someone has logged in to the CPAS site at to get an account to download the system, McIntosh said.

CPAS developers have no particular strategies in mind to try to get people to use their system, other than to make the software useful, McIntosh said.

"A lot of small labs that don't have large budgets for developing bioinformatics platforms will find it particularly useful," he said.

The CPAS software will be updated several times per year, McIntosh said. Researchers are currently working on adding a function for handling quantitation data from isotopic labeling experiments, and a function for evaluating MALDI data.

"There are a lot of functionalities that need to be added for wider adoption," said McIntosh. "We plan to develop the software using a process that is as open as possible."

Within the Fred Hutchinson center, Amanda Paulovich has been heading up a laboratory component of the CPAS project. She and her colleagues are studying 10 mouse models of cancer, and depositing data into the CPAS system. They aim to come up with a well-defined, reproducible procedure for profiling the serum of mouse models of cancer.

CPAS is compatible with Windows, Linux, Macintosh, PostgreSQL and MS SQL servers, among others.

— Tien Shun Lee ([email protected])

The Scan

LINE-1 Linked to Premature Aging Conditions

Researchers report in Science Translational Medicine that the accumulation of LINE-1 RNA contributes to premature aging conditions and that symptoms can be improved by targeting them.

Team Presents Cattle Genotype-Tissue Expression Atlas

Using RNA sequences representing thousands of cattle samples, researchers looked at relationships between cattle genotype and tissue expression in Nature Genetics.

Researchers Map Recombination in Khoe-San Population

With whole-genome sequences for dozens of individuals from the Nama population, researchers saw in Genome Biology fine-scale recombination patterns that clustered outside of other populations.

Myotonic Dystrophy Repeat Detected in Family Genome Sequencing Analysis

While sequencing individuals from a multi-generation family, researchers identified a myotonic dystrophy type 2-related short tandem repeat in the European Journal of Human Genetics.