The National Human Genome Research Institute is funding a project at Stanford University that will evaluate how open source software licenses affect the adoption and use of bioinformatics software.
Mark Lemley, a professor at Stanford Law School, is the principal investigator on the project, which he is conducting in collaboration with Wei Zhou, senior director of advanced technology at the legal department of Affymetrix.
Lemley told BioInform that the study will use a software package from Affymetrix that is available under both an open source and a proprietary license, making it an ideal test case for the project.
“You can basically get the same program for free under the open source license or you can pay money for it, and that provides a perfect natural test of who chooses open source software and what they do with it,” he said. “We’ve got identical software here, so we can follow who uses what and [ask], ’Do they modify it, do they turn it into a resold product, does it get used in studies that show up in published papers?’”
BioInform was unable to confirm exactly what Affy software package is involved in the study. Zhou said through an Affy spokesperson that the project involves the company’s gene-expression software, but did not provide further details.
Affymetrix began releasing some of its internally developed applications under open source licenses in 2004. At the time, Steve Lincoln, vice president of informatics, told BioInform that open source was becoming a “key part of our developer support strategy, where we are continually trying to reduce any barriers folks could have in building solutions around our GeneChip platform.” [BioInform 02-21-06]
While the academic bioinformatics community has long embraced open source software, and companies like Affymetrix have recently begun to turn toward open licensing models, Lemley said that the benefits of open source licensing — within bioinformatics and more broadly — have yet to be proven empirically.
The outcome of the study will likely be “less of a bio payoff than a more general teaching about open source software,” he said, “but it’s a real interesting natural experiment.”
The grant abstract cites a debate in the bioinformatics community that has been brewing since 2001, when a group of scientists petitioned the National Institute of Health and the National Science Foundation to require grantees to distribute software developed with public funding under open source licenses [BioInform 09-10-01].
Since then, “a number of bioinformaticists have argued that funding agencies should not require that grant-funded software development projects be distributed under open source licenses,” the abstract states. “These researchers cite the lack of empirical evidence supporting the assertion that software tools distributed under open source model have a significantly higher probability of success.”
The study aims to survey several hundred users of the Affy software and track citations of the software in the scientific literature in order to quantify usage and adoption trends. In addition, the study aims to track improvements and derivatives from the original software.
Lemley said there are three key questions that he hopes to answer. First, “Is there a difference in who uses open source — say between the academic and the business communities?” In addition, he said, “Does one side — open source or proprietary — tend to be used more by people who publish research papers as opposed to doing things internally?”
“We’ve got identical software here, so we can follow who uses what and [ask] do they modify it, do they turn it into a resold product, does it get used in studies that show up in published papers?”
The third question is whether people are modifying the open source version of the software. “The big supposed advantage of open source is that anybody can modify it and make changes to it, and this is a great way to find out if that’s really what people are doing,” Lemley said.
The project actually kicked off about a year ago, but just received a two-year grant that runs through 2008. According to the NIH database, the project was awarded $145,936 in 2006.
“Probably in the next 18 months we’ll be putting together the data and collecting the survey information,” Lemley said.
The grant abstract lists an additional goal for the project: “to formulate a public policy recommendation … for genomic analysis software licensing strategies,” but Lemley described that aspect of the study in fairly broad terms.
“You could imagine it influencing decisions to release something under open source or not,” he said. “It might have influence for law, in terms of deciding whether the open source model really is one that promotes freedom to change things, or whether it turns out to be more limited.”
The primary goal of the study, he noted, is to determine whether a number of assumptions about open source software are actually true.
“The conventional wisdom would say that open source is more likely to be used by academics, and proprietary software by private companies. And I think the conventional wisdom would also say that open source is more likely to be used by tinkerers, and people who actually want to change the program. But whether they’re true, who knows?”