ACS Accuses NCBI s PubChem of Copying Its CAS Registry; Is Compromise Possible?


In a move that has taken NIH officials by surprise, the American Chemical Society is enlisting several political allies to pressure NCBI into scaling back its new PubChem small-molecule database, claiming that it duplicates ACS' subscription-based Chemical Abstracts Service.

In March, Ohio Governor Bob Taft and the state's congressional representatives drafted a letter to US Health and Human Services Secretary Michael Leavitt arguing that PubChem presents unfair competition for CAS that could lead to a loss of jobs in the state. CAS, a nonprofit, employs 1,200 people in Ohio.

Talks between the ACS and NIH are still ongoing, but the parties have yet to reach an agreement. Some observers believe that ACS is pushing to shut down the database altogether through its lobbying efforts or a lawsuit.

Michael Dennis, vice president of planning and development for CAS, told BioInform that the NIH has gone "way beyond [its] stated purpose" with the launch of PubChem, which came online in October. This purpose, Dennis said, was to provide a repository for small molecules and bioassay information generated under the Molecular Libraries Screening Center Network, an NIH Roadmap initiative.

"The problem for us is that they're not focusing on that," Dennis said. "Instead, they are adding small molecules to the PubChem database from anyplace that they can get them from, and in any category. … The categories are not just small molecules with drug-like qualities. Instead there are small molecules in there from categories like explosives, which probably aren't going to cure any diseases. Or categories like plastics, polymers, or industrial chemicals, like acetylene, and textile dyes — not drug-like small molecules."

Dennis said that ACS views the NIH Roadmap as "a wonderful initiative," and "we're not asking that they take PubChem down or kill PubChem." However, he said the database was not what ACS had expected. "If you were to look at a PubChem record and a CAS Registry record, you would see that if you put the two side by side, both in form and in substance, it's all the same information."

Christopher Austin, senior advisor for translational research at NHGRI and a principal leader for the Molecular Libraries implementation group, told BioInform that he and other NIH officials were "flabbergasted" by ACS' claims. "Both the topic and the ferocity with which that has happened has taken us by surprise," he said.

"ACS wants us to strictly limit the information in PubChem to only that information that comes out of the molecular library screening centers, and not allow data from any other source to be present in the database," he said. "The problem with that is that it would downgrade the value of the database to the community."

Austin noted that all of the 850,000 compounds currently in PubChem have come from publicly available databases — most of them NIH-funded resources — in an effort to populate it with some chemical information before the first data from the screening centers comes online later this year. "All the compounds that are in PubChem have been in the public domain for years and years," he said.

This effort to aggregate disparate public chemical data into a single resource was long overdue, Austin noted, pointing out that if the Human Genome Project had followed a similar model, "[and] you wanted to find the human genome, you would have to go to five different databases to find it, which makes absolutely no sense, and would radically impede the progress of research."

While Austin did acknowledge that a few molecules with "minimal biological relevance" may have found their way into PubChem as a result of its automated porting process, many of the seemingly "industrial" chemicals are present "because they came from toxicological databases" hosted by the National Institute of Environmental Health Sciences and other NIH-funded research groups studying the biomedical effects of a wide range of chemicals.

As an example, Austin noted that the active chemical in warfarin, a blood thinner used to prevent strokes, is also the active ingredient in rat poison. "You might say that [molecule] has no business being in PubChem because it's used as a pesticide, but it's also used as a drug," he said.

Other observers said the information in the two resources is not as similar as ACS claims. "CAS was designed for chemists, and PubChem was designed for biologists, so there's obviously some overlap, but the fundamental user communities are different," said Steve Heller, a chemistry consultant who serves on the advisory panel for PubChem. "They're apples and oranges. The fact that they're both fruits is the only overlap."

Doreen Jesseph, marketing director for Dialog, a subsidiary of Thomson that sells its own subscription-based chemical databases, including Beilstein Facts and the Derwent Chemistry Resource, noted that there is "some degree of overlap" between most chemical information resources, but that the data in PubChem "is not as extensive as what you would get on Dialog or [CAS's] STN."

Jesseph said that Dialog considers all chemical databases — both public and private — to be competition, but that the firm does not view PubChem as a threat. "I don't think we're losing any business to free services. Right now, when we don't get business, it goes to another paid service," she said.

CAS certainly contains many more molecules — accompanied by much more information — than PubChem. CAS offers information on more than 25 million compounds, which have been manually curated over the last 100 years from the chemical literature and patents. The organization's staff added 2.5 million compounds to the resource in 2004 alone. PubChem, which has still not hit the million-molecule mark, has automatically ported existing collections of chemicals, so the level of detail and annotation is not nearly the same quality as one would find in CAS.

Austin pointed out the "irony" that NIH is currently negotiating an enterprise-wide license for the CAS Registry — a license that would not be necessary if NIH researchers could get all the chemical data they needed from PubChem, he said.

As for competition, Austin pointed out that CAS has a staff of 1,200 and a budget in the hundreds of millions of dollars, while PubChem employs 13 people and has an annual budget of $3 million.

"If 13 government employees can put 1,200 CAS employees out of business, you have to wonder what kind of supermen the government is hiring," Heller said.

Nevertheless, ACS views its effort to scale back the free resource as "an issue of survival," Dennis told BioInform. While it's too early to tell how the launch of PubChem has affected the non-profit society's revenues, which totaled $410 million in 2004, "We do have customers now — some major customers — telling us that they are going to use the PubChem database in lieu of ours because it is free," he said.

CAS and the ACS publications division contributed $371 million, or 90 percent, of the organization's 2004 receipts, although ACS did not break out sales for these units. Dennis declined to provide specific revenue numbers for the CAS Registry, but he did say that the resource is responsible for most of the organization's income. "We build some other databases, but the Registry is the crown jewel," he said. "It's the core database here at Chemical Abstracts Service."

Negotiations at an Impasse

ACS and NIH do agree on one thing: Each side considers the other to be uncooperative. "We've offered several ways to collaborate, and unfortunately, the NIH has rebuffed us," said Dennis. "They don't have any interest, at least at the moment, in working with us."

Austin, however, said that ACS has taken a "confrontational" approach. "Even when we were meeting with them, and, we thought, negotiating with them to try to work out some mutually agreeable resolution, we continued to get letters from congressmen that meant that they were still pitching their line in the political direction, with the threat of cutting off funding."

He said that NIH expected some resistance from industry when it launched the Molecular Libraries initiative, and that it has already successfully assuaged pharma's fears that it would compete on the level of drug development. The agency also thought it could have similar results with the ACS.

"At first, we thought this was going to be a misunderstanding, because this is so new for NIH to be getting into and there have been misunderstandings at multiple levels," Austin said. "So we thought that once we explained to ACS and CAS what we were actually doing, they'd say, 'Oh well, in that case, don't worry about it.' But that hasn't happened."

In fact, Austin said, "We thought that CAS would view this as a great opportunity," because PubChem intended to link out to the CAS Registry in a similar way that PubMed links out to subscription-based journals. "We would like to … make those links available, but we can only make those links if CAS gives us the information to allow the links to be made," he said. Rather than cutting into the CAS subscriber base, Austin said that PubChem might actually help expand it. "200,000 NIH grantees could be their potential customers," he said.

Most recently, NIH proposed a technical working group to address some of the issues that CAS has raised, but Austin said that CAS had not yet agreed to meet. Meanwhile, he said, NIH officials plan to meet with "the folks on the hill" this week. "They need to hear the other side," he said.

CAS' Dennis declined to comment on ACS' future plans.


NIH and ACS also agree that this situation is not comparable to the public/private flap that ensued over Celera's genome database — a once-hot controversy that ended with a whimper last month when the company announced that it would stop selling the database and will release its data into the public domain [BioInform 05-02-05].

"There wasn't a database of all the world's biosequences already established when the NIH or NCBI created GenBank," Dennis said, noting that CAS "is an established private-sector database that's been out there for 100 years."

Predictably, Austin's take on the comparison is slightly different. Even though "Celera decided to give up the ship" largely due to the availability of public sequence data, the public and commercial genome-sequencing efforts "were doing the same thing, so it was a duplication." In the case of PubChem, he said, "there is no desire — or even the resources — on PubChem's part to duplicate the hand curation that CAS does."

Some observers pointed to the case of PubScience, a Department of Energy database of journal abstracts that was shut down in 2002 after several commercial publishers convinced Congress that the resource presented unfair competition. Austin said he doesn't consider this to be a relevant precedent because, again, the databases of interest in that case did in fact duplicate each other, while "the purpose of PubChem is different than CAS."

Others point to a more recent example as a possible omen for the outcome of the PubChem situation: In December, ACS sued Google, claiming that its Google Scholar literature-search functionality infringes upon its own SciFinder Scholar trademark.

ACS has denied that the suit is related to the fact that Google's service is free while SciFinder Scholar is a subscription-based resource.

So far, ACS is "treating PubChem like it would treat a commercial competitor," Austin said. "I can understand that they're nervous — there's never been anything in the public sector like this before — but I think they're taking the worst possible view."

Austin believes that "once the data screening start coming out, and once the biomedical research community starts realizing that chemicals really are important and that they want to work with them — and if we could convince CAS to make connections between PubChem and CAS — then I think they're going to find that their fears are just not warranted."

The "biggest fear" for Austin is that "there's going to be some political movement to put the kibosh on this before it even gets out of the cradle," he said. "That would just be a terrible travesty."

A spokesman from Ohio Governor Taft's office was unable to comment in time for this publication.

— Bernadette Toner ([email protected])

