Sean Eddy of Washington University in St. Louis and author of the provocative call to arms against the Celera-Science deal told BioInform that he and co-author Ewan Birney of the European Bioinformatics Institute have been getting a “great response” to the letter.
“I’m swamped with e-mail,” said Eddy, who is also a Howard Hughes Medical Institute investigator.
Under the terms of the agreement, Science has agreed to allow Celera to submit its human genome sequence data to the journal while making the data available only on its website, not through a public international database. Previously, researchers have had to submit gene data to a public database before publication in Science.
Celera has agreed to make the published sequence data available free to academic users, but researchers who wish to download over one megabase will have to sign a formal agreement not to redistribute the data. Commercial users also will be able to access this data, as long as they sign a material transfer agreement not to commercialize the results or redistribute the sequence.
In the letter Eddy and Birney say that the proposed restrictions on redistribution would present a significant obstacle to bioinformatics research.
“A large part of bioinformatics is redistribution,” Eddy said. “Every database that comes out is derived from a primary database, which is technically redistribution.” He estimated that half of the projects currently underway in his lab would be blocked under the current terms.
“The medical community wants more integrated data from bioinformatics. They want to see entire data annotation and we have to do it in a scalable way,” he said. “We don’t want to get blocked by a contract that says here’s the genome but you can’t annotate it.”
Despite Science’s claim that there are no reach-though restrictions on publication of a researcher’s results, Eddy posits that the journal is thinking in terms of traditional gene-by-gene access rather than large-scale bioinformatics.
“They may have underemphasized the number of scenarios that a bioinformatics lab could produce in which ‘publication’ might appear as ‘redistribution’ to Celera,” he said. Computational analysis that provides Web-based supplementary data, for example, would most likely be considered redistribution.
Celera has said that researchers would remain free to make discoveries, publish, and patent discoveries and that it would not seek reach-through royalties.
Nevertheless, Eddy said confusion would loom until the actual terms of the agreement are made public.
“The problem is that we don’t really know what the terms are,” said Eddy. “What Science has published so far [regarding the terms of the material transfer agreement] is internally contradictory. You can download the data and publish, but you can’t show primary data without technically redistributing it. Showing a figure will disclose Celera data. The assumption is that this is below radar, but the contract you sign puts you in technical violation.”
Barbara Jasny, supervisory senior editor at Science, said that the journal would not make the terms of the material transfer agreement public until the paper is published, but did confirm that the terms have not yet been finalized.
The letter sparked a flurry of responses from the bioinformatics community, where the subject is being hotly debated. The letter is posted on www.bioinformatics.org, generating a number of responses, and is also a topic of debate on www.slashdot.org and the www.bioperl.org listserv.
Eddy said that the responses have been mixed, though most agree that the concerns raised in the letter are valid. “Some people are taking a knee-jerk anti-Celera reaction,” he said, “but I think what Celera is doing is pretty liberal. They risk their entire business model by going public with this information.”
According to Eddy, most responses uphold the view that DNA is pre-competitive information, and that the entire genomic community would benefit from a free exchange of all the available information. Companies could then focus on the downstream products rather than the data itself. “The more eyes that are on the genome, the better off we’ll all be,” he said. Many seem to fear that the Celera/Science agreement will set a dangerous precedent that may “balkanize the database.”
Opinion seems to be split as to whether this agreement is indeed a precedent for Science. Bioinformatics consultant Nat Goodman of 3rd Millennium pointed out that unencumbered access to intellectual property is not a requirement for publication. “If a company publishes analysis of a proprietary clone, they don’t have to give up the right to the commercial value of the clone,” he explained. “Because this is data, people feel differently.” Eddy said that many others have cited similar examples, although he maintained that this would be a precedent for DNA sequencing data, which has historically been publicly available upon publication.
Robert Beck, vice president for information research and planning at Baylor College of Medicine, sees the agreement as a dangerous exception to the journal’s standing policy of full disclosure. “Whether or not Celera makes it straightforward for researchers to get access to the data in this paper is secondary to the fact that this will make it easier to restrict access with regard to future advances, and I find this troubling,” he said.
Other bioinformaticists have been suggesting ways to get around the terms of the agreement. One researcher suggested in a bioinformatics forum that bioinformaticists could encode the data into aggregate script so that any “redistribution” that would occur through publishing the results would be redistribution of code, not data. Eddy does not endorse this view, however.
Christof Ouzounis, group leader at EBI, said that he hopes Science would consider the opinion of the scientific community before accepting the paper.
Compugen CEO Eli Mintz told BioInform that Birney and Eddy are raising good questions, but acknowledged that Science had a tough choice. “Do they let the data be made public in a way that may not be as usable as what would be expected from the public domain or does it remain in Celera’s safe?”
Mintz agreed with Birney’s and Eddy’s point that it’s difficult to know where to draw the line between publication and redistribution. “I’d like to read the material transfer agreement,” he said. “If it’s no good for academic research, then it’s a mistake.”
In response to the letter, Jasny defended the journal’s decision as necessary and appropriate.
“First of all, in our view we are remaining true to our principles of access,” said Jasny. “We also strongly feel the alternative was that this data would not see the light of day except to a few subscribers. We felt that was a bad solution for the general public.”
But she also said Science is aware of concerns that bioinformaticists have about being able to publish work that involves large chunks of Celera’s data, and hopes there will be a way to make it possible for scientists to do this without running aground of the agreement.
“As far as the concerns of people doing whole genome annotations, [there] is ongoing discussion with Celera — on how to allow that,” Jasny said.
Jasny also said that Science would be receiving a copy of the full sequence in escrow, on a DVD-ROM. This copy of the sequence would provide security for Celera’s assurance that it would uphold its end of the agreement.
As of press time, Eddy had not had a definitive response from Jasny in regard to the letter, though he said that she “certainly seems willing to listen.”
Noting the challenge that Science and Celera face in crafting a material transfer agreement that will satisfy bioinformaticists, he said, “I’m hoping Celera decides to revert to their original business model, which was to release the primary sequence data freely but keep their annotation as a proprietary database.”