Formalizing a longstanding — but unofficial — alliance, the US-based Protein Data Bank, Europe’s Macromolecular Structure Database, and the Protein Data Bank of Japan last week announced plans to become the collective custodians of the newly formed worldwide Protein Data Bank (wwPDB).
But despite the new name, a new website (www.wwpdb.org), and a high-profile announcement in the December issue of Nature Structural Biology, those depositing or accessing structural biology data will see few changes as a result of the new collaboration. “Operationally, as far as the users are concerned, there will be no difference whatsoever,” said Helen Berman, director of the Research Collaboratory for Structural Bioinformatics, the consortium of three US institutions that oversees the PDB. Rather, the wwPDB signifies an agreement between the three international groups to keep things running pretty much as they have been for the last five years.
“We’ve been actually carrying out informally what we’ve formalized with this agreement for many years,” said Kim Henrick, head of the MSD at the European Bioinformatics Institute. “Previously, it was very informal, but it worked very well.” The primary goal of the wwPDB, therefore, is to ensure that those features of the three-way collaboration that have worked smoothly in the past will continue to do so. “It will give assurance to everybody that there will simply be one PDB, and one set of identifiers, and one archive,” Henrick said. “Perhaps some part of the community might have thought that the EBI might have gone separately, but we had no intention of ever doing that.”
The formal collaboration also underscores a trend among the major international repositories of biological data to provide integrated, non-redundant resources for their users. Swiss-Prot, for example, was awarded $15 million in NIH funding last year to combine its protein sequence information with that of the Protein Information Resource to create the new UniProt database [BioInform 10-28-02], and the major nucleotide data providers — GenBank, EMBL, and DDBJ — have collaborated since 1986 on a data-exchange policy and a standard data format to ensure interoperability.
The creation of the wwPDB reinforces the view that “all biology data is a global resource, and there needs to be shared responsibilities” among database providers, Henrick said.
Tying the Knot
The three wwPDB collaborators — the RCSB, the EBI, and the PDBj at the Institute for Protein Research at Osaka University — signed a charter agreement in July to continue their existing arrangement to maintain a single archive of structural data (available at http://www. wwpdb.org/wwpdb_charter.html).
The charter defines the “PDB archive” as the sets of flat files in the PDB format and the mmCIF format. The three sites agreed to distribute the archive with mirrored content and the same subdirectory structure. RCSB has been deemed the “archive keeper” for the PDB, which gives it sole write access and control over directory structure and content, and responsibility for distributing new PDB identifiers to deposition sites. Each of the three sites, however, will have flexibility regarding web site design, browsers, and database query engines to access the data, as well as deposition software packages.
“It’s basically agreeing to keep certain things standard no matter what, while giving freedom in other areas so that all the creativity can be expressed in something other than the representation of the data,” said Berman. Giving the various sites leeway in terms of their database front ends is an important aspect of the agreement, she noted. “Protein structure data is extremely rich and subject to a lot of interpretation, and we feel that lots of people — including ourselves — should be able to interpret it, if we wish, in different ways by putting them into different kinds of databases.”
The agreement also provides mechanisms for the collaborators to work together on future data formats and delivery methods. For example, the three groups are already co-developing a single XML format for PDB files, which is currently undergoing testing, Berman said.
In the future, the three groups might also exchange visualization or search tools, work on adding other types of structural data, or collaborate on other aspects of the database, Berman said, but those details have yet to be worked out. The wwPDB members will hold their first planning meeting in March, where they will discuss some of these longer-term plans and will also appoint an advisory panel made up of representatives from the three sites as well as from the International Union of Crystallography and the International Council on Magnetic Resonance in Biological Systems.
The initial term of the charter agreement is 10 years.
Same as it Ever Was
The agreement to preserve the operational status quo for the PDB serves as a pledge that the wwPDB will continue to offer a single, freely available archive, regardless of unforeseen changes in the funding or political climate. The long-term availability of public biological data resources is not guaranteed, but international collaboration does provide a bit of a safety net. In the case of Swiss-Prot, for example, European funding cuts forced the Swiss Bioinformatics Institute to create a commercial arm to sell licenses to the database just to keep the project going. Now, with funding from the US NIH, the proposed UniProt database is a truly international project, and is secure for at least the length of the grant, which expires in August 2005.
As biological data repositories continue to evolve as global, rather than local resources, formal collaborations such as the wwPDB help lay out the ground rules for future caretakers, Berman noted. “We feel very strongly that the PDB is an international effort, no matter who funds it, and we wanted to keep it that way,” she said.
Unlike UniProt, the wwPDB did not receive any additional funds from any of the three sites’ funding sources, Berman said. Rather, the initiative grew from a desire among the three sites to guard against propagating different versions of the same dataset. “This is basically an international agreement to protect the data from getting messed up,” she said. “We haven’t messed it up so far — we don’t think — but there’s tremendous potential for something bad happening because all the agreements have been informal and not in writing.”
It’s likely that the agreement will eventually have some impact on funding agencies, however, who are loathe to fund duplicate efforts across multiple sites. In addition, nascent international data repository projects for other types of biological information — such as microarray gene expression data, genotype data, pathway data, protein interaction data, and the like — now have another example to follow to prevent a Babel of duplicate database efforts.