This story originally ran on Sept. 8.
By Tony Fong
Name: David Gloriam
Position: Post-doc, University of Copenhagen, department of medicinal chemistry, biostructural research and molecular pharmacology groups, Sept. 2008 to present
Background: Post-doc, GlaxoSmithKline, Harlow/Stevenage, computational chemistry and computational biology groups, Aug. 2007 to Aug. 2008; post-doc, EMBL-EBI, Aug. 2007 to June 2008; post-doc, EMBL-EBI, InAct team (molecular interactions database) in proteomic services group
In recent years, a number of initiatives has been started to tackle the poor quality of commercially available antibodies and the dearth of antibodies for numerous targets. These include the Human Protein Atlas, a separate initiative by the National Cancer Institute, the German Antibody Factory, and ProteomeBinders.
While such efforts may result in an increase in the number of quality antibodies, another problem exists: the databases containing information about such antibodies each have their own formats, which are incompatible with each other, making the exchange of data nearly impossible.
In an article published Aug. 11 in the online version of Molecular & Cellular Proteomics, a wide swath of researchers including those from HPA, NCI, GAF, and ProteomeBinders call for "a global community standard format for the representation and exchange of protein affinity reagent data," and specifically advocate for PSI-PAR as that standard format.
PSI-PAR, which has use for any protein affinity reagent, is maintained by the Human Proteome Organization's Proteomics Standards Initiative. Because it is built on a mature and widely accepted proteomics standard format, PSI-MI, it has been thoroughly tested, there are a number of software tools that have already been developed to support it, and it will require less maintenance than a new standard would, according to the authors of the MCP article.
They add that PSI-PAR would have advantages for both non-profit initiatives and commercial vendors. "Non-profit initiatives would benefit from the free access to the format and the associated tools," they wrote. "Commercial vendors would be attracted by increased market exposure of their products. Researchers wishing to purchase protein affinity reagents would benefit from the possibility of establishing centralized stores of quality-controlled PARs with larger choice, higher quality, and lower cost."
ProteoMonitor recently spoke with David Gloriam, the first author on the study, about the MCP article, the need for a standard format, and PSI-PAR in particular. Below is an edited transcript of the conversation.
What was the motivation behind this paper?
The work, in my case, was initiated by the ProteomeBinders community. It's a big consortium, a European consortium, arranging antibody producers and quality-control centers, informatics centers for suggesting which epitopes to target in the proteins, [and] a central database facility, and something that, in the end, would develop into a warehouse, where people can buy their affinity reagents [see PM 12/13/07 and 03/22/07].
So there are many different parts, and naturally these parts need to exchange information in a standard format. Also, data would accumulate on the same affinity reagents coming from different partners at different stages of the process, production, and target validation.
My work was initiated because of the ProteomeBinders consortium and their specific need for this format in their work. However, the work was elevated to a higher level, now a global level in the [framework] of the HUPO Proteomics Standards Initiative.
They present proteomics standards in a wide range of proteomic fields.
[ pagebreak ]
I've been seeing these people, I've been going to their meetings a lot, and the last author of this paper, Henning Hermjakob, has been the chairman of PSI for many, many years.
So we raised [it] to their level and we managed to find a lot of overlap in the needs of representing affinity reagents with the representation of molecular interactions. And that's why we [wrote] this manuscript and [built] this standard on the exchange format for molecular interactions.
Could you briefly describe what PSI-PAR is?
It is a standard format for the representation of antibody reagents, and all these different centers … need to have standard formats in order to understand each other when they exchange information.
It's a format based on XML, which is often used for storing data. The idea is that individual databases of all different partners can be connected by this standard format —also LIMS systems, information coming directly out of equipment used by people, for example, in quality control.
Another example is, of course, there are many protein and nucleotide sequence databases out there. For example, if we compare the NCBI Entrez database and the protein sequences that are in there … to resources that EBI has and UniProt and Swiss-Prot, they all contain protein sequences, but they're not the same database, and they don't have the same database structure.
We're not trying to get people to use the same internal database; we're trying to get them to use the same exchange format. And this is what [this paper] is trying to do.
They can use their current databases as long as they can export and import their data in this standard format. They can all communicate.
It's like a second language to people.
Is it ultimately to prove that the data that comes out of these databases is accurate?
Absolutely. And you can make bigger databases if one partner can access the data from other partners. You would have a lot more information in these databases. It would also be easier for [companies] to sell this information.
Are there other formats aside from PSI-PAR that are under consideration as a global community standard?
For affinity reagents, this is the only one. For other proteomics standards, HUPO PSI has delivered a number of standard formats. The idea is that there should be only one standard because if people use different formats, they would be incompatible, so that would be a shame.
And HUPO PSI is the global authority in defining standards, so PSI-PAR was submitted for internal review of HUPO PSI before it was sent to a journal for a manuscript review.
In PSI's review process, it is first assessed by an internal committee of editors, who try to deem whether the scope is appropriate, and whether the standard has a real need, and whether the documents and technical details seem to be adequate.
Then they send it off to an external review and that review is open, so there should be the opportunity for people from all over the world to give their comments.
What is still in progress, and you haven't been able to read about it in the [paper], is that it's also common to have an agreement on which information to capture. This format can capture a lot of information, but there should also be guidelines about the minimum information to submit, and it's called the minimum information about the protein affinity reagent.
Is PSI-PAR specifically suitable for antibodies or is it for any kind of protein affinity reagent?
It is designed to be suitable for any protein affinity reagent. That's very important for ProteomeBinders as they also want to look into [different types] of affinity reagents.
This standard is trying to be very neutral, whereas I would agree that, globally, antibodies have the primary role.
[ pagebreak ]
How difficult would it be for someone to comply with this standard?
If someone manages a database and wants to export and import data, there are a number of tools available with this format that makes it a lot quicker and easier to use this format than develop their own format.
If there isn't a database in place already, and someone is setting up a new database, then there is the opportunity to use the existing IntAct database scheme. That's the database that EBI runs for molecular interactions, and it's open source, it's free, and it's 100 percent compatible with this format, meaning there is already an [application programming interface] so there [are] already the programming tools that you need to then develop export and import functions.
They're already there. So, you don't have to go to the bottom, you can already use what's there.
What has been the reception to this proposal?
Amongst ProteomeBinders partners, it has been great. It's been very, very positive. Of course, there is a need. They need to exchange this information. In general, standards need a lot of time to get in place. It takes time for people to agree on a common representation.
But here it's been pretty quick.
People have had some inkling about this idea for a while. Have you seen any differences in how it's being received by commercial vendors versus academic researchers?
No, surprisingly I haven't had many negative comments. I think that many [partners], especially commercial vendors, lag behind. They take their time to adopt standards, but if the support is strong enough, and if people are requiring [adoption of the standards], if other resources, databases are picking it up, after awhile, they will also start to use it.
That has been the case, for example, with the PSI-MI format. That has been a format for a long time in molecular interactions. Now, there is some equipment [where] people [manufacturing] these machines are starting to export in the PSI-MI format directly from the machine, the read-out of the machine.
I think if ProteomeBinders proves to be successful, or if the Human Protein Atlas … is adopting this standard and promoting it, I think over the years, commercial vendors [will start] picking it up.
This is all voluntary. How would you get people to comply with this?
There has to be self-interest for them. They have to gain something by using it.
What is it that the commercial vendors gain by using this standard?
One thing we describe in the article is making their products more available. If, in the example of ProteomeBinders, they want to offer one big warehouse where all sources could submit their antibodies for sale and get the golden stamp of approval for quality control, that would certainly give them the stamp of quality, and bigger availability.
Is PSI-PAR still evolving, or are the standards and formats pretty set?
Standards have to evolve in order to represent new types of data, so as the data evolves, the representation needs to evolve.
[ pagebreak ]
However, they also need to be static for certain periods of time in order for people to adapt their databases to interact with the format. So you can't make big changes in the format, because that will break compatibility with databases.
What is standard for PSI-MI, the format which this standard extends, is that it is usually released in a new version every third or fourth year. There is some extendibility built into the format. There are some small updates that you can make without breaking the compatibility [in both PSI-MI and PSI-PAR].
However, for very, very big changes, you normally have to wait until this three- or four-year cycle.
Having the standard be part of PSI ensures the long-term maintenance and [the] survival of the format. That's one very important thing. Anyone could have made a format that they want to be the standard and produce it, send it off to a journal, and publish it. But after that, depending on the funding for the research group [that developed the standard] we've seen too often things have died.
This format is guaranteed by the HUPO Proteomics Standards Initiative to be maintained. There are already a lot of resources for maintaining PSI-MI. So the long-term maintenance of both the exchange format and the control vocabularies that go with it are guaranteed by HUPO PSI-MI.
[Researchers] won't invest a lot of money into a standard that will die the next year. You have to have a long-term objective when you set up these standards, and that is in place.
What is your role in PSI-PAR?
I was sitting in between the existing format of PSI-MI … and the current need from ProteomeBinders of a representation data for antibody reagents. My role was to be able to understand both the informatics of the existing PSI-MI format and the needs of the biologists in ProteomeBinders to represent their data.
So being a bioinformatician, I try to merge the biology and the informatics basically.
Would it be very expensive to adopt this standard?
Relative to making something on their own, [PSI-PAR] would be a lot less expensive and a lot easier. How much it would cost, I can't say.
If this standard is adopted by the community, what do you think will happen to the quality of antibody-based proteomics being done?
I think there are two things here. One thing is the format being a global standard for antibody reagent representation, presented and maintained by PSI. The other thing is the ProteomeBinders consortium … and I think you will see different effects.
ProteomeBinders' overall goal is to make a warehouse available for people who want to buy antibodies, so that they can buy more antibodies, avoiding the problem that there is actually no antibody for the target they want to study and increase the quality.
However, even if ProteomeBinders doesn't get funding … at the level of PSI having a global standard, [PSI-PAR] still can be useful for all other efforts and initiatives. It's not depending on ProteomeBinders, at all.