A new protein-identification database called PRIDE that allows researchers to search by tissue will be released at the Human Proteome Organization conference in Beijing at the end of the month, according to European Bioinformatics Institute engineers who developed the database.
“PRIDE is a database which says in this tissue, under these conditions, we have found these proteins,” said Philip Jones, a software engineer at the Cambridge, UK-based EBI. “It is a database for storing protein identifications and protein experiments.”
According to Jones, the development of PRIDE began in 2003 after the need for a database like it was discussed at a meeting of researchers involved in the HUPO Plasma Proteome Project. By the end of this month, PRIDE should be deployed to store data from the PPP, the HUPO brain project and the HUPO liver project, said Jones.
“Right now PRIDE is still a fairly prototypic system,” said Jones. “Six months from now we hope to have a mature system with quite a lot of data.”
Once PRIDE has been launched, researchers will be able to access it free-of-charge by logging onto www.ebi.ac.uk/pride. The database is open for software development, meaning that researchers can download sourcecode from it and modify the code for their own purposes.
Each entry in the PRIDE database contains the proteins identified, the tissue, the experiment conducted, the conditions of the experiment, a list of peptides used to make identifications, any post-translational modifications to those peptides, and links to any publication describing the experiment.
In addition, the database can also store details of what methods were used to make the protein identification — information that includes whether the experiment was gel based or not, what kind of mass spectrometer was used, and what parameters were used, in addition to scores that indicate the likelihood that the identification was correct.
“The overarching concept is in the experiments,” Jones emphasized. “The database contains a description of the experiment, the location of tissues, any number of samples and details from the protocol.”
At the moment, researchers may search PRIDE by tissue, by protein, by sample number, or by experiment title. In the future, they will also be able to search it by publication and by some other parameters, Jones said.
One feature of PRIDE that was developed with HUPO needs in mind is the ability of data to be shared privately by multiple labs working in collaboration, such as the labs working on the Plasma Proteome Project, Jones said. Research labs can publish their data privately on the database, allowing collaborator labs to have a first stab at analyzing the data before sharing them with the public.
PRIDE differs from the UniProt/Swiss-Prot annotated protein sequence database, also developed by EBI, in that it is centered on protein identification, rather than protein description, said Jones. The new database is linked to UniProt/Swiss-Prot by protein accession number.
“I think PRIDE is filling a fairly unique niche. It’s not competing with the likes of UniProt/Swiss-Prot,” said Jones. “I don’t think there’s going to be a problem with getting people to submit their data to PRIDE.”
The PRIDE system currently has its own XML Schema, or specification of how data should be structured, but that schema is only temporary until formats from the Proteomics Standards Initiative are established, said Jones
Founded in 2002, the PSI aims to develop standards for mass spectrometry protein-protein interaction data. One of the first things established by PSI is the Minimum Information About a Proteomics Experiment, or MIAPE requirements, which will be used in developing PRIDE’s schema.
In terms of a timeline for future development, Jones said that PRIDE would be publicly accessible and showcased at the end of the month at the HUPO conference in Beijing. Following that launch, the database would be beta tested for about six months, during which database developers would be open to ideas from users. At the end of these six months, the database would be further developed to add more functionalities.
One functionality that PRIDE’s engineers are looking to add to the database is the ability to include a mass spectrometry peak list for the identification of peptides. Such a peak list would be useful if a protein identification is deleted and reanalysis of data is needed.
“By including peak lists in PRIDE, we’ll enable people to do that — to reanalyze data,” said Jones.
PRIDE was initially designed and developed by Leonart Martens, who is now at the University of Ghent, in Belgium. The database took about six to eight months to develop, said Jones. Besides Martens and Jones, other people at EBI who worked on the database include Henning Hermjakob, who coordinated and managed the project; Rolf Apweiler, who headed the sequence database group; and software engineers Samuel Kerrien, Mark Rijnbeek, Kai Runte, and Chris Taylor.