In what a company official described as "something of an experiment," Elsevier MDL said last week that it will deposit around 2,500 structures from its xPharm database of pharmacological activity into NCBI's publicly available PubChem resource.
Phil McHale, vice president of marketing and corporate communications at Elsevier MDL, told BioInform that the agreement will serve as a test case to determine "whether this sort of linking is beneficial both to the users of PubChem and to our customers."
Elsevier MDL and NCBI will monitor traffic and usage, McHale said, "and, depending on the outcome, we might consider links from PubChem through to [our] broader set of databases of chemical structures and associated properties."
McHale stressed that the company is not making "any commitments at the moment," but it would consider following a similar model for other databases in its broad DiscoveryGate collection.
The xPharm database is one resource in the DiscoveryGate platform and includes pharmacological data on the top 2,500 "major, marketed, known active compounds," McHale said. The company has deposited the chemical structures for these compounds in PubChem, along with links back to xPharm. Licensed users will be able to immediately access the additional information available in the resource, while those without licenses "can see sample records, and they can sign up for an evaluation and decide whether they want to pursue that link and see more information," McHale said.
"We don't perceive it as a major threat because, really, all we've provided is a pointer. The only people that can link seamlessly from PubChem to xPharm are the people who have already licensed it."
Elsevier MDL opted to start with xPharm because it's a "small, clean, well-curated database, so it's easy to manage from both our end and from the PubChem end," McHale said. "We didn't want to get embroiled in processing thousands and thousands of structures."
Copyright issues were also a concern, because some of the other DiscoveryGate databases include data from third-party partners. "Not to say that we couldn't do it," McHale said, "but it would just make it more complicated to be able to do it in a timely way."
McHale said that some of the structures from xPharm were already in PubChem, but he was unable to provide specific numbers on the degree of overlap.
While the structures may not be entirely novel to PubChem, the agreement is still a coup of sorts for NCBI and NIH. PubChem drew fire from the American Chemical Society earlier this year, which claimed that the publicly available small-molecule repository posed a competitive threat to its subscription-based Chemical Abstracts Service [BioInform 05-16-05].
NIH officials have maintained that PubChem actually creates an opportunity for commercial information providers, who can use the free database as a vehicle to expand their customer base into the large biological community that accesses NCBI's resources a claim that Elsevier MDL is testing through the xPharm agreement.
McHale said that the issue of duplication "was a factor that we weighed, but we don't perceive it as a major threat because, really, all we've provided is a pointer. The only people that can link seamlessly from PubChem to xPharm are the people who have already licensed it. So that piece of the equation is covered. And any other people who would want to see that data because they landed at PubChem and got to xPharm are potentially incremental users for us."
NCBI's Steve Bryant, director of PubChem, said that in the bioinformatics world, "there really aren't a lot of commercial information sources that cover a broad swath of biology, but that's just not true in the chemical world."
As extensive as NCBI's biomedical resources are, "we don't really have coverage of the physical chemistry literature, the organic chemistry literature only spotty, here and there. We have nothing, really about the tremendous patent literature in chemistry. So there's a real complementarity here because the commercial services do focus on these areas, and we can't," he said.
"In a way, xPharm is a bit of an example of that," Bryant added. "That resource is telling you the pharmacology of this drug target, and it points you off to other information that MDL has. Who has the patents on this, and what is the literature on how this was synthesized, and so on."
Bryant said that NIH recently hosted the first meeting for a working group of private-sector participants that was formed to advise NCBI on issues of interest to commercial providers of chemical information. [BioInform 09-12-05].
He said that representatives from around 15 commercial firms attended the meeting, and that the agreement with Elsevier MDL was discussed as an example of "a way to expose their information resource to the biologists who use NCBI, and there were some comments from others that this was something that they may like to explore as well."
Bryant acknowledged that ACS representatives and others at the meeting continued to voice concerns that PubChem posed a threat to their businesses, and admitted that "what the PubChem activity is and what the plan for it is wasn't clear to everyone."
Specifically, he said, "there were at least some members of the committee from the private sector who really didn't understand that the way data gets into PubChem is that depositors send chemical structure records. We don't type them in here, we don't even edit them here. It's a deposition and the record belongs to the depositor."
The meeting helped clear up some misunderstandings on both sides, Bryant said. "Myself and my colleagues here at NCBI are from the bioinformatics world, and to us the idea of a deposition-based database like Genbank has been around for a couple of decades, and the issue of what is a public activity and how it relates to the private sector we think we understand that. But in the chemistry world, there really hasn't been this kind of activity before, any deposition-based, open, public system."
The "consensus," he said, "was that the group should meet again in a few months … to comment on the pros and cons of interacting" with PubChem. In addition to the links from PubChem to Elsevier MDL's xPharm, Bryant said that several vendors, such as ACD/Labs and ChemIndustry, now provide links from their online resources to PubChem, and that more than half of PubChem's hits now come from external chemistry sources.
Bryant estimated that PubChem currently has around 20,000 visitors per day.
The resource is growing rapidly. Bryant said that since June, the number of structure records has grown from 1 million to 8 million, and the number of unique chemical structures has grown from 700,000 to 5 million. He said that he has recently expanded the PubChem staff to add several curators for bioassay data, and that the resource is now supported by 15 full-time employees.
The first results from the NIH Chemical Genomics Screening Centers are expected to come online "early next year," he said.
Elsevier MDL's McHale said that the company expects to "make some decisions" regarding its relationship with NIH and PubChem in the first quarter of next year.
Bernadette Toner ([email protected])