The world of scientific publishing is vastly different today than it was a decade ago. The onslaught of genomic and other biological data derived from new analysis techniques, combined with the ubiquity of the web, has forced the scientific community to contend with issues that simply didn’t exist before: How much data, and what types of data, should an author be required to make publicly available upon publication? What should the terms of access to this data be? What is the best method of delivery for this information?
These concerns came to a head last year with Science’s decision to publish Celera Genomics’ paper on the sequence of the human genome with limited terms of access to the company’s data. “Many people thought that some community standard had been broken by that act,” said Robin Schoen, a project director at the National Academy of Sciences who heads up a new initiative at the NAS to examine such issues.
Spurred by a flood of requests from scientists angered by the Science/Celera case, as well as some who supported Science’s decision, the academy launched the project, “Responsibilities of Authorship in the Biological Sciences,” in November. As part of the initiative, funded by the NHGRI, the NCI, the NSF, and the Sloan Foundation, a committee of ten scientists, entrepreneurs, publishers, and representatives from the federal funding agencies met last week for a workshop. Schoen said that the findings of the panel would be used to prepare a report to guide the community as the use of large data sets in biological research becomes commonplace.
Schoen said panel members were hand-selected by the academy to represent a wide range of perspectives on the workshop topic, “Community Standards for Publication-Related Data and Materials.” Chaired by Thomas Cech, president of the Howard Hughes Medical Institute, the meeting featured a keynote by Eric Lander and a full day of discussion about what, if any, community standards for biological publishing currently exist and whether they can and should be enforced. Not exactly the kind of topic you can hash out over tuna sandwiches, but several attendees of the meeting noted that the dialogue was an important step in the right direction.
Recognizing the risk of simply drawing up what Ari Patrinos, director of biological and environmental research at the US Department of Energy, termed “feel-good statements,” panel members attempted to delve a bit deeper to reach agreement on some of the stickier aspects of data access. “There may be dispute on how things are shared actually, but the committee is trying to go to the heart of what’s critical in terms of publishing and sharing,” said Schoen.
Committee member Sean Eddy, a bioinformaticist at Washington University, summarized: “Everyone agreed that publication was a special moment, and upon publication the data should be made as available as possible to any qualified person who asks for it, in a form that allows people to build on it as well as to replicate the data.”
However, Eddy said, complications begin to arise as soon as deliberately vague terms like “as available as possible” enter the discussion. Noting that mechanisms to protect intellectual property such as patents, copyright, material transfer agreements, and the Bayh-Dole technology transfer act “conflict with some of what we’d like to do in science,” Eddy conceded that agreement on the details could get “messy.”
Even messier are the tangential questions that creep into the discourse once these issues are put on the table. Schoen noted that it was a bit of a challenge “keeping everybody on track” at the workshop. Questions related to enabling technologies, such as whether source code for bioinformatics software ought to be provided upon publication or whether there ought to be public repositories for these software tools, arose, as did the issue of pre-published data, a particularly hot item in light of last week’s Science article on biologist Mitchell Sogin being “scooped” by his own Giardia genome data (see sidebar, p. 3).
“Those are all important issues,” said Schoen, “but we’re just trying to get it as close to a manageable topic as we can.”
At the heart of the matter, according to Eddy, is the current “disconnection” between journal papers and their supporting data, which must now come in the form of an electronic supplement of some kind. “We have to make it clear that the data that are associated with the paper actually have to be as available as the paper,” he said.
The final NAS report is expected to be published by the fall of this year, pending an anonymous peer-review process, followed by a response by the committee and final approval by an independent body at the academy. Schoen said the NAS also intends to summarize the key points of the report’s findings in letters to the editors of several key journals.
Even though what those findings may be still remains a question, panel members were encouraged by the steps taken so far. “Even if we can’t nail down the details, just stating the principles is an important thing,” said Eddy.