Proteomics researchers should expect to see new software tools coming out early next year that are based upon a new data standards model called the Functional Genomics Experiment model, or FuGE, according to FuGE developer Angel Pizarro.
The new software should help proteomics researchers to design and interpret experiments so that they adhere to the minimum information about a proteomics experiment, or MIAPE, data standards. The software should also help researchers in other functional genomics fields such as genomics, transcriptomics, and metabolomics to adhere to data standards.
"One of the long-term goals for the Proteomics Standards Initiative is to bring all data standards under one roof," said Pizarro, the director of the bioinformatics facility at the University of Pennsylvania's Institute for Translational Medicine and Therapeutics, who gave a talk on FuGE during the PSI's spring workshop in Siena, Italy. "If parties are working within the same data format, I think it will help determine what are the statistically significant results at the end of the day. It will help, for example, RNA expression researchers and proteomics researchers to link together data from experiments in a systems biology way."
The idea of FuGE came about when groups working on the microarray and gene expression data model MAGE and groups working on the proteomics data model PEDRo realized that there were some overlapping segments that were common to any wet-lab research. The groups began talking with each other and with PSI leaders about developing a model that would work with both microarray and proteomics data.
Last September, a number of researchers came together and began working on FuGE. The researchers come from both academic institutions, including Stanford, the University of Pennsylvania, the University of Glasgow, and the University of Edinborough, and from industry, including Rosetta Biosoftware, Affymetrix, GenoLogics, Applied Biosystems, Agilent and Thermo Electron. In addition, the non-profit European Bioinformatics Institute has contributed significantly to the FuGE effort.
"This is something that a lot of people have put their hearts into," said Pizarro.
The FuGE developers met in April during PSI's spring workshop. They plan to meet again during the first week of August to finish determining the standards, and by the first week of September, the group hopes to have a framework for producing XML schema that will give developers a platform for developing FuGE-based software tools.
FuGE itself does not actually define reporting requirements for different technologies, such as gel preparation and mass spectrometry, but instead provides a framework to support common structures and formats for reporting, Pizarro said.
"The grand vision is to be able to provide this framework that people would be able to use as a standards model when developing software," he explained.
FuGE is simpler and more stripped down when compared to MAGE, Pizarro said. It incorporates some of the circular work flows of PEDRo, which comes from proteomic experimental processes that can produce data and other materials that produce more data and more materials.
One of the key points of FuGE is that it will deliver an ontology model that is common across the various technologies of genomics, proteomics, metabolomics, and transcriptomics experiments.
"Ontologies, by their nature, are very system-specific and can't be viewed outside of the system they were based in," said Pizarro. "What FuGE will do is reference external ontologies. It will query semantic validity in whatever system [the terminologies] were developed in. The hope is to apply something like a symantic web to try to glue everything together so that you can do content-based queries."
The software vendors that will benefit most directly from FuGE are LIMS developers who deal with sample management and tying data together, Pizarro said. Developers of more specific types of software, such as mass spec software, are concentrating more on using the open data format mzData to allow users to be able to share data more easily (see ProteoMonitor 11/19/2004).
"GenoLogics has been looking at [FuGE] very closely, and I don't see why ABI wouldn't use it for their up and coming LIMS system," said Pizarro.
One of the biggest problems facing proteomics researchers is that their methods are not up to dealing with high-throughput analysis yet, and a lot of data is replicated, Pizarro said.
"Proteomics is starting to push its boundaries in terms of data size, speed and production of data. It's really exciting to be at the frontier of dealing with those boundaries," he added.
The main website for finding out information about FuGE is at http://fuge.sourceforge.net/. Pizarro said he hopes to publicize the format through marketing efforts.
"One of the reasons MAGE failed to get its message across is because of lack of marketing," he said. "What I'd like to see for FuGE is widespread adoption. I will make it my personal goal to get the word out and to explain to software developers in plain language how FuGE is useful to them as something that actually works and something that they can use immediately."
Tien Shun Lee ([email protected])