He tapped a few keys to check his naughty and nice database, and yes indeed, the bioinformatics kids had been very good this year. They’d cleaned up so many messes — half-finished genomes, noisy microarrays, unreliable protein interactions — with hardly a complaint. This was a gift he’d gladly bring down the chimney.
There’s a friendly tradition for sharing source code in the bioinformatics software development community.
Free and open software has been a norm of the bioinformatics culture for years. Many members of the community generously provide their software in this manner. Prominent examples include BLAST, FASTA, sim4, AceDB, HMMER, EMBOSS, and Ensembl; there are many others.
The practice, however, is by no means universally accepted. Important software, such as Phil Green’s phred/phrap/consed suite, and Warren Gish’s WU-BLAST are free of charge to academics, but not commercial users, and are not really open to anyone because of restrictive licensing. Phil Green defends his approach on the grounds that the university uses the fees generated by software sales to fund new investigators and projects.
In the emerging area of microarray analysis, software developers seem to be retreating from the free and open credo. I surveyed 10 academic microarray packages chosen at random from the 37 such packages I know of. I found that five were clearly open, two others were free of charge to academics, one was free for research purposes whether in an academic or commercial setting, and two offered no clear means to obtain the software. I have no comparable data for older areas such as sequence analysis, so can’t tell whether this is a trend, but it sure feels that way.
Beyond individual projects, a veritable shopping mall of organizations has sprouted to advance the open-source cause. The bio*.orgs — BioPerl, BioJava, etc. — are probably the oldest. These groups consist mostly of hands-on programmers and are focused on development using specific technologies, such as Perl, Java, and so forth. An umbrella organization, the Open Bioinformatics Foundation (Open-Bio), has recently formed to coordinate the bio*s. Chris Dagdigian of Blackstone Computing is the unofficial mother hen of this group.
Another old standby is Bioinformatics.org, founded in 1998 at the University of Massachusetts at Lowell. This group, which hosts the development of open bioinformatics projects and web sites, claims to have 1,216 members and to host 42 projects and web sites. Ken Marx is the current chair.
A new advocacy group, Open Informatics, has recently joined the fray. This group has launched a petition drive demanding that “researchers supported by publicly-funded grant[ing] agencies … be required, as a condition of funding, to publish any source code under an Open Source or a Free Software license.” By mid-October, the petition had garnered about 160 signatures. Open Informatics is led by Jason Stewart, formerly of the GeneX project at the National Center for Genome Resources in Santa Fe, NM.
Free and open software also plays a critical role in the computing world at large. Linux is probably the most famous example: Recent market numbers show it to be the most popular operating system for server computers after Windows, with about 30 percent of the market. (Windows has about 50 percent market share.)
Perl — well known among bioinformaticists — is widely used to deliver dynamic web content, as are PHP, an HTML-oriented scripting language, and Python, another scripting language. R, a dialect of S-PLUS, is a widely used scientific programming language and is emerging as an important tool for microarray analysis. The Apache web server has 60 percent market share, which puts it far ahead of the number two, Microsoft’s IIS.
BIND, used to convert symbolic host names such as genomeweb.com into physical network addresses, has 95 percent of the market. Sendmail remains the most widely used e-mail transport software on the Internet. MySQL and PostgreSQL are widely used relational database managers, and Berkeley DB is a popular high-performance non-relational system.
The Free Software Foundation was the pioneer of the free software movement. It funds the GNU Project (pronounced “guh-new,” but don’t ask what it stands for), which is a Santa’s shop of free software. Among their many accomplishments, GNUsters developed most of Linux except the kernel.
Richard Stallman, who established the foundation in the mid-’80s and mobilized a legion of free-spirited free software developers to create immensely useful software systems, received a MacArthur award in 1990 in recognition of his genius. His foundation talks about free software in eloquent, ethical terms, arguing that free software is “a matter of liberty, not price … ‘free speech,’ not … ‘free beer.’” It espouses four freedoms: “The freedom to run the program, for any purpose; the freedom to study how the program works and adapt it to your needs; the freedom to redistribute copies so you can help your neighbor; and the freedom to improve the program, and release your improvements to the public so that the whole community benefits.”
The Free Software Foundation developed two influential software licenses that embody their principles: the GNU General Public License intended for standalone programs, and the GNU Lesser General Public License, formerly called the GNU Library GPL, intended for software modules that are only useful as elements of larger programs.
Meanwhile, another pioneering open-source organization, the Open Software Initiative, is a relative newcomer, founded in the late ’90s, to promote free software among the masses. The group aims to be more pragmatic and less idealistic than FSF and actively courts the private sector. It adopted the term “open” rather than “free” to make the approach more palatable to the profit- minded, and used the word “source” to emphasize the goal of getting source code in the hands of developers.
OSI’s definition of open source has eight parts, the gist being that the software must be distributed in source code form, the recipient must be allowed to modify and redistribute it, and anyone who distributes the software must treat everyone the same (e.g., no special deals for academics).
OSI has not developed any licenses itself, but has a process for approving licenses developed by others. About two dozen licenses have been approved to date, including both of the FSF licenses.
Though it’s not obvious from the definitions, open source is a superset of free software. The difference is counter-intuitive: OSI allows you to give your software away without restriction, while FSF requires that you only distribute software to people who agree to keep it free. In practice, this means that you must impose one of the FSF licenses on anyone who wants your software.
Free and open software is an extraordinary gift to the community. We are fortunate that so many bioinformaticians choose to disseminate their software in this way.
It’s sad to see this culture weakening in new areas like microarray analysis. I hope it’s just a passing fad, and that the new generation will get with the program once they realize they’re not going to get rich off their bioinformatics creations.
I’m also saddened by the coercive thrust of the Open Informatics petition. In effect, the Open Informatics folks are trying to force software developers to be civic minded. This is ethically wrong, it won’t work, and it distracts from the real goal of educating people about the value of free and open software.
There’s a compelling lesson to be learned from an ongoing effort by the US National Institutes of Health to increase access to research tools in general. A wonderful report was submitted to the NIH director in June 1998. Over the ensuing three and a half years, it was substantially watered down, and may be approved someday soon.
We don’t need Santa to bring us free and open software. It’s already here. Our challenge is to nurture the spirit that leads to such generous giving.
A Great Gift, But License Required
If you’re a programmer, you know that source code is a gift of gold. It lets you do anything to the program that the original developer could have done. You can fix bugs in the program, port it to a different computer or operating system, improve its performance, or extend its capabilities in ways large and small. You can learn the program’s secrets, apply them to new projects, and even carve the program into pieces to use in new ways. It’s potent stuff.
If you’re a non-programmer, source code is as good as a lump of coal. Even so, you will benefit by it when programmers use it to develop the software you use.
To exploit source code legally, a programmer needs a license. For institutions with savvy legal departments, licensing issues are as important as technical ones. It would be nice if there were a standard license that everyone used, but no such luck. Instead, we’re blessed with about two dozen “standard” licenses — great gift ideas for the lawyers on your holiday list.
The biggest bone of contention tends to be redistribution rights. The most lenient licenses place no restrictions on how you redistribute the software: you can give it away or sell it, and you can impose any licensing terms you want. More common are licenses that let you give it away but not sell it, and that propagate the same license to the new recipient. Some licenses prohibit all redistribution, but most aficionados would not regard this as free and open.
Some software developers, especially commercial vendors, employ a dual licensing strategy in which the recipient can choose to get the software for free under a free or open license, or can choose to purchase the software under a commercial license. The free license prohibits reselling of the software, while the commercial license permits it. Customers can opt for the commercial license if they’re planning to use the software as part of a larger product that they intend to sell.
Another controversial area is whether the software is free and open for everyone or just academics. Most devotees would argue that, in order to be labeled free and open, the software has to be available to everyone — academic and commercial — on the same terms.
Pragmatically, if you decide to limit your software to academics, you open a can of worms in defining exactly who qualifies as an academic, and what you’re willing to let them do with your work. Do you include not-for-profit research institutions? What about charitable organizations? How about high school students? Their parents? How about for-profit contractors working for academic groups? Can a writer like me use your software to write an article? Can a professor or grad student use your software in her new startup? ¯ NG
The Season of Giving: Open-Source Orgs Aplenty
Open Bioinformatics www.open-bio.org
Open Informatics www.openinformatics.org
Free Software Foundation www.fsf.org/fsf
Open Source Initiative www.opensource.org
Re port of the National Institutes of Health Working Group on Research Tools www.nih.gov./news/researchtools; see also ott.od.nih.gov/NewPages/xtramrl.html for subsequent revisions of the report
Where to Shop for Open-Source Software
BLAST NCBI ftp.ncbi.nlm.nih.gov/blast
FASTA Bill Pearson ftp.virginia.edu/pub/fasta
sim4 Webb Miller globin.cse.psu.edu
AceDB Jean Thierry-Mieg, Richard Durbin www.acedb.org
HMMER Sean Eddy hmmer.wustl.edu
EMBOSS Peter Rice et al www.emboss.org
Ensembl Ewan Birney www.ensembl.org
phred/phrap/consed Phil Green www.phrap.org
WU-BLAST Warren Gish blast.wustl.edu
Nat Goodman, PhD, helped found the Whitehead/MIT Center for Genome Research, directed a bioinformatics group at the Jackson Laboratory, led a bioinformatics marketing team for Compaq Computer, and has been consulting ever since. He is currently a free agent in Seattle. Send your comments to Nat at [email protected]