PHILADELPHIA--The bioinformatics community recently moved a step closer to the goal of creating a common library of easily shared software components, known as bioWidgets, with the start of preliminary testing of a prototype bioWidget from the University of Pennsylvania's Computational Biology and Informatics Laboratory here. In addition, there has been significant new progress in efforts to forge common standards for transferring genomic data among software components, despite lingering controversy over the issue.
Ultimately, researchers should be able to simply download bioWidgets from the internet and custom-assemble them to create specialized programs for such tasks as gene mapping and data mining. That vision is driving the bioWidget Consortium, an informal alliance of academic, industry, and government researchers dedicated to developing reusable, interoperable software components for the graphical display of genomic data.
Key players in the year-old effort include David Searls of SmithKline Beecham Pharma ceuticals' Bioinformatics Group; Chris Overton of the University of Pennsylvania lab; Nathan Goodman of the Jackson Labora tory; Susannah Lewis of the University of California, Berkeley's Drosophila Genome Center; Gregg Helt of Neomorphic, Inc.; Stan Letovsky of the Genome Database at Johns Hopkins Medical Institutes; and Tom Flores of the European Bioinformatics Institute.
"The consortium is promoting the creation of a library of bioWidgets, which are software components that can be developed by many people, shared over the Worldwide Web, and easily assembled by users," Goodman told BioInform. "The goal is to have researchers spending less time reinventing software and more time searching for knowledge."
The consortium formed last August to address a problem long discussed within the bioinformatics community: the lack of software that can be both easily shared among large and small labs, and easily customized to meet individual researchers' needs.
Historically, architects of genome information systems faced two choices, Goodman and colleagues at the Massachusetts Institute of Technology's Whitehead Institute noted in 1995: "build it yourself so that it does exactly what you want, or adopt someone else's system and live with most of its quirks and limitations." However, according to a document on the bioWidget Consortium's Web page (http://goodman.jax.org/projects/biowidgets/consortium), "in the long term, neither of these approaches leads to an improvement in genomic software because they are not building incrementally upon previous work: the software development cycle always begins at ground zero."
To address this inefficiency, Goodman and others proposed an alternative called "componentry"--the concept of assembling powerful, flexible software systems from relatively simple components written by different developers following a few standard rules. One of the earliest researchers to experiment with bioinformatics componentry was Searls, who was then on the University of Pennsylvania faculty. In 1995 he released bioTk, a set of software components for displaying genomic data as graphic maps that users could examine in progressively greater levels of detail. Borrowing a common term from computer science, Searls called the tools "widgets," a term that soon evolved into "bioWidgets."
"This work was motivated by the observation that, even within our own group, there was a disturbing degree of reinvention in the development of a wide range of seemingly disparate graphical user interfaces," Searls explained in a document on the bioWidget Consortium's Web page. The problem could be avoided, he and colleagues reasoned, if researchers were able to reuse and assemble into more elaborate display programs a few seemingly "dumb" pieces of software that didn't care how they were receiving data or how the underlying database was organized. Widgets "need not attempt to model data or knowledge of the domain, beyond its characteristic graphical objects and their typical behaviors in applications," Searls wrote.
"A widget embodies the graphical appearance and graphical behavior of data, but does not model its underlying meaning," Goodman explained. "For example, a map widget embodies the knowledge that a genomic map consists of line segments and points arranged in a linear or circular geometry; it can draw a map in various formats and can perform typical display functions such as scroll and zoom; it also knows that when a user clicks on a map element, some further property of the element should be displayed. The widget does not know the differences among different kinds of maps, except insofar as this affects their appearance or graphical behavior."
Searls' work was soon picked up by other researchers, including Berkeley graduate student Gregg Helt, who transformed bioTk into TkPerl. Before either display technology attracted wide use, however, the network programming language Java emerged as an attractive, Web-based alternative for developing bioWidgets. Soon, programmers at both Penn and Berkeley were working, first independently and then together, to adapt their widgets to Java.
"When the Web grew and Java struck, everyone who was trying to build user interfaces sort of junked them and started over," Goodman recalled.
The fresh start prompted by Java was marked by increasing collaboration among a growing number of bioWidget researchers. Last August they gathered at the University of Pennsylvania to formally found the consortium. There and at a second meeting held in Berkeley this year, the group began tackling some of the obstacles to realizing its vision, including agreeing on software standards, developing a legal framework for sharing widgets, and developing a process for testing, debugging, and distributing new widgets.
Agreeing on Standards
Currently, researchers are circulating a draft statement of "Core Widget Architecture" that defines the Java objects that all bioWidgets must have in common so they can work together and have the same look and feel. "We've pushed things forward and are close to achieving agreement," said Letovksy. "Once it is out there people can tweak their code into compliance."
Using the bioWidget Consortium as their model, researchers also appear ready to make progress on a related but more difficult problem: developing standards for data transfer, so that programmers don't have to write specialized translation programs for each component. At last month's Objects In Bioinformatics conference at the European Bioinformatics Institute in Hinxton, U.K., SmithKline Beecham's David Benton called the bioWidget Consortium "probably the most successful attempt to create public-domain components for bioinformatics systems," and proposed "the establishment of a federation of consortia to establish the required standards and to build, test, and document software components that are beyond the scope of the current consortium." The proposal is reportedly being pursued within the Object Data Management Group, the same international standards-setting body responsible for publishing the CORBA data-sharing standards. Researchers hope an agreement in what one calls "a dispute as complicated as the Arab-Israeli conflict" will arrive within a few months.
The question of whether the consortium will ever formally incorporate as a nonprofit organization hinges, in large part, on the outcome of ongoing discussions over software licensing. At its first meeting, "the consortium took a strong stand on intellectual property issues, insisting that all software developed through the consortium be freely available and redistributable," Goodman noted. How ever, that approach is being resisted by large universities and industry members of the consortium, who believe widgets have commercial potential. A compromise under discussion would have the consortium encourage, but not require, software developers to make executable files, but not source code, freely available to academics. This solution troubles researchers such as Goodman, who said it won't allow scientists the opportunity to improve the source code.
At the moment legal discussions are well ahead of the consortium's tangible products. So far, very few bioWidgets are ready for distribution, but the pace is expected to pick up once the legal issues are settled. In the meantime, the University of Pennsylvania lab has sent an updated prototype DNA sequence display widget--which displays a sequence of letters, usually DNA bases, with annota tion--to Goodman's lab for quality assurance testing. Goodman, who expects to take on the testing task for many of the new widgets, estimated that debugging a "large" widget containing 5,000 lines of code will take one programmer two months. He has proposed that original developers be responsible for fixing major bugs.
Debugged widgets will be made available to researchers through a website repository maintained by Goodman http://goodman.jax.org/projects/biowidgets/consortium/repository.html. So far the library holds one released widget and a wide range of works in progress. The released software, which Goodman used to test his quality assurance process, is a DnaDisplay Widget developed by Will Fitzhugh at MIT's Whitehead Genome Center. It displays a DNA sequence or match as a colored bar. Exactly how new bioWidgets will be designed, tested, and distributed is expected to be a major topic of discussion at a week-long consortium "boot camp" to be held at the Jackson Lab in October.
Goodman believes that in addition to the practical benefits the consortium's software-sharing effort may produce, the concept also "offers compelling social benefits." It has the potential "to encourage community-based software development," he said.
"Software packages such as Perl, Tcl/Tk, and many others gain tremendous value from incremental improvements made by software developers throughout the community. Such community efforts can gel more readily when the community adopts a culture in which software-sharing is the norm, and system developers expect, as a matter of routine, to incorporate other people's software in the systems they are building," Goodman observed. If the bioWidget Consortium is successful, such incorporation will soon become the rule, instead of the exception, he concluded.
--David A. Malakoff