NEW YORK, Sept. 14 - Peer review is a fundamental part of the scientific publishing process, so why shouldn’t it be required for software too? A growing number of bioinformatics practitioners are asking this question, prompting an organized effort to encourage public funding agencies to formally support open source software development.
In a paper being considered for publication in Briefings in Bioinformatics , Jason Stewart and his co-authors Harry Mangalam and Jiaye Zhou argue that federal agencies should require researchers to “release software funded by public grants under an accepted open source software license before the end of the grant.” To back up this assertion, Stewart has begun collecting signatures for a petition posted on his website ( www.openinformatics.org ).
Stewart, a former NCGR scientist who recently founded contract bioinformatics company Open Informatics, described the proposal as a logical extension of the scientific peer review process. “For scientists and researchers, our data is key. If our data is of good quality we can make reliable statements, but if our data is of bad quality then nobody can trust what we say. And that’s what the peer review process is meant to deal with. It’s meant to assign, through peers, a level of reliability to your work.”
MASSIVELY PARALLEL PEER REVIEW
But while researchers are reporting more and more experimental results based on computational analysis of high-throughput experimental data, they are not required to submit implementations of the algorithms used as part of their supporting data. This “black box” approach puts the validity of the entire experiment at risk, according to Stewart. “Unless we have the source code implementation for the algorithms that are doing the data transformation, we can’t assess the reliability of those implementations, therefore we can’t assess the reliability of the algorithm, therefore we can’t assess the reliability of the data.”
Because the open source software development process relies on a peer-review approach by definition, with multiple sets of eyes scanning for possible errors in an implementation, Stewart and his co-authors consider open source release a necessary component of future public funding policy.
The effort already has some prominent allies. Nathan Torkington, an O’Reilly and Associates editor and Perl programmer who helped draft the petition, noted that he became involved because “there’s a lot of interesting crossovers between scientific research and open source.” Tim O’Reilly, who founded the publishing company, has long been an open source advocate and the petition “falls naturally into the type of things that we do,” Torkington said.
O’Reilly has pledged to support the effort and Torkington said there are plans in place to set up a panel discussion on the topic at the Bioinformatics Technology Conference the company is hosting in January.
Aside from the pragmatic argument that making source code available will serve as a more direct path to reproducible scientific results, Torkington added that there are philosophical issues at stake as well. “Public funding of research should lead to public funding of source code,” he said. “We don’t expect the government to pay for research that will never see the light of day. Likewise software.”
The argument shouldn’t be a hard sell for many at the National Science Foundation. Sylvia Spengler, a program director in the NSF’s Biological Databases and Informatics Division, said that while the agency has no official policy in place that addresses the issue, “There’s kind of an unspoken anticipation that people will make their software available and obviously open source is the way to do it.”
Proposals that indicate they will release their software as open source are “generally well received by panelists,” Spengler said, adding, “I’m a fan of open source. I’m always pleased when principal investigators put it in their plan.”
Spengler said she has many like-minded colleagues at the NSF and other federal funding agencies, but a more formal policy would be a good idea, she said, because current policies leave most of the decisions up to the individual program directors.
In addition, financial incentives should also encourage granting bodies and researchers to sign the petition, according to Stewart. “If agencies make it such that developed software becomes open source, what they are making available is a common set of tools that have already been paid for. So as a scientist you no longer have to include a hundred thousand dollars in your budget to buy a proprietary gene expression database technology. NSF can take that hundred thousand dollars times 20 research grants every year and apply that to new fundamental science.”
GETTING THE BALL ROLLING
The initial petition is intended to be a general call to action in order to raise awareness of this issue, Stewart said. He intends to build momentum over the next six months and encourage discussion among industry representatives, NSF and NIH program directors, and bioinformatics developers through a mailing list ( http://lists.sourceforge.net/lists/listinfo/openinformatics-petition ). Key issues on the table are the particular type of open source license that will be encouraged and how the policy supported by the petition may need to vary between particular funding agencies. Another crucial topic will be crafting a policy that does not conflict with the Bayh-Dole Act, which allows universities to patent intellectual property derived from publicly funded research.
Daniel Gezelter, a Notre Dame biochemist and director of the Open Science Project, a non-profit organization funded by the Sloan Foundation to support open source scientific software, also backs the petition “wholeheartedly.” However, he noted that convincing the NSF, NIH, and DOE to change their current policies will require a great deal of effort.
Each agency has mechanisms in place to encourage the commercialization of technology developed with public funding, such as SBIR and ATP grants, Gezelter noted. “They all have this dichotomy between policies that were developed to commercialize new technologies, where the rules maybe should be a little bit different for software.”
“There are cases in which commercializing software makes sense,” Gezelter added. “Maybe if somebody commercializes their software using an SBIR grant they should be strongly encouraged to make it open source.”
Despite the early stages of the effort, Stewart said he’s seen a good deal of enthusiasm. Upon mentioning the petition to friends and business associates, he said, “Universally the comment is, 'This is great. Where do I sign?’”
This story originally appeared in BioInform, GenomeWeb's weekly newsletter on bioinformatics. For more information, see www.bioinform.com .