A recent editorial in Nature suggested that the Human Proteome Organization was at last beginning to have some effect — and that proteomics may have some future after all. HUPO has been on the scientific scene for almost five years now. It was launched with some fanfare in 2001 as the sole organization to speak for the proteomics community. The hope was that by having a single umbrella organization, a field that appeared to be a profusion of methods, protocols, and reporting styles could be rationalized to create a more monolithic practice.
The HUPO project of relevance to informatics is the Proteomics Standards Initiative. PSI chose to use the experience gained from nucleic acid microarray-based technologies as a guide for proteomics informatics, specifically using the MIAME, or Minimum Information About a Microarray Experiment, standard as a starting point. After all, microarray data was rapidly accepted by the biological community, so following the same path should shepherd proteomics toward the same happy acceptance. It seemed to work. PSI had a quick success: the definition of the Molecular Interaction standard format, known as MI XML, for the communication of protein-protein interaction information between the databases that store that information. BIND, MIPS, MINT, DIP, and InterAct all agreed that such a standard was desirable, and each has since made some effort to support the standard.
Fortunately for PSI, the interchange of this data was relatively simple. The interaction databases already existed, and they all held roughly the same set of information about protein-protein interactions. The schema for the PSI MI XML were simple and captured all the information necessary. There was no attempt to alter how the information was held in individual databases; the only practical requirement for conformance was that there be a simple translation layer between the internal database representation of the information and the XML representation.
But the core of proteomics information, protein identification and quantification, has proven to be a much more difficult nut to crack. By concentrating on the planned reporting standard, MIAPE (Minimum Information About a Proteomic Experiment), the PSI group has come up against the fundamental differences between proteomic and transcriptomic analysis, which confound the use of MIAME-style informatics in proteomics.
Not So Straightforward
In nucleic acid arrays the fundamental data is simple to model and the experiments themselves could be used as examples in a Unified Modeling Language textbook. An array consists of oligonucleotides (20 to 30 mers) printed onto a planar substrate in discrete spots, with each spot composed of a single nucleotide sequence. If some of the RNA in a sample hybridizes with one of these oligonucleotide spots, a signal with a dynamic range of about 1,000 is generated. All of the signals can be read and interpreted simultaneously. The only possible output is a list of gene names and corresponding intensities, even though the information processing can be quite complex. Because the output is so simple, MIAME concentrated on recording the details of the wet lab portion of the experiment, which can be somewhat more difficult to capture in a machine readable format.
Proteomics has not developed a similarly simple, unambiguous experiment. Anyone who attempts to publish results that are the exact equivalent of microarray experiments — the identification and quantification of a gene product by the detection of single 7- to 10-mer oligopeptides — should be prepared to be excoriated by reviewers and dismissed as an old-fashioned crackpot. Single peptide signals even have their own disparaging nickname: one-hit wonders.
Instead of a single coherent experimental protocol for the identification and quantification of the gene products present in a sample, proteomics has branched out into a large number of different protocols and experiments with myriad goals. Concepts totally unknown in microarray experiments are commonly discussed as prime motivations for proteomics research projects, such as protein sequence coverage, post-translational modifications, or cross-species homology matching. At the most recent American Society for Mass Spectrometry meeting in Seattle, aircraft-hangar-sized rooms were filled daily with new posters on proteomics experiments. I suspect that no two of them espoused the same combination of equipment and algorithms to solve even the most closely related biological problems. Many of them simply reported the results of comparisons of nearly equivalent laboratory instrumentation, with a series of graphs heaping finger-wagging scorn on the unfortunate candidate equipment and algorithms.
The lack of a coherent set of standard experimental protocols and disagreement as to what constitutes a proteomics result has led the PSI group far from the idea of simplifying proteomics. Recently, members have begun to consider an even more comprehensive set of informatic concepts that may allow them to describe any type of experimental scheme an enterprising analyst may think up. These generalized concepts are the Functional Genomics Experiment (FuGE) and Functional Genomics Ontology (FuGO) projects. While both projects are still at the prototype stage, they may hold a possible answer for large-scale proteomics informatics.
A more positive approach would be for HUPO to step forward and provide some leadership. The Human Genome Organization had a straightforward goal: “Determine the sequence of the human genome.” Participants surveyed technologies, eventually settled on one, and lobbied for the necessary resources to complete the first draft.
Is it reasonable to ask for HUPO to have such a goal? The organization’s current goals are open-ended, such as the primary goal of the Liver Proteome Project: to “generate an integrative approach that will lead to a comprehensive functional map of the liver.” There is no concept of what would comprise an endpoint or even a “first draft” corresponding to such goals, and therefore there can be no technological race to achieve them.
Instead there is the oft-stated hope that in the end, regardless of which experiments are performed, bioinformaticists will be able to pull everything together and save the day. Just how we are supposed to achieve that task seems to have been left up to the bioinformatics community. Come to think of it, when I consider it that way, I guess the Nature editorial was right — there may be some hope for proteomics after all.
Ron Beavis has developed instrumentation and informatics for protein analysis since joining Brian Chait’s group at Rockefeller University in 1989. He currently runs his own bioinformatics design and consulting company, Beavis Informatics, based in Winnipeg, Canada.