Building on its deep roots within the bioinformatics open source community, Electric Genetics has embarked upon a project to scrutinize the popular BioPerl package line-by-line. The effort is expected to improve the quality of the publicly available code base while simultaneously bolstering EG’s latest offering — a consulting service focused on helping bioinformatics teams in industry use open source software more effectively.
Peter van Heusden, a software engineer leading the BioPerl validation project at EG, discussed the rationale behind the effort as well as some early findings at the Bioinformatics Open Source Conference that preceded this year’s ISMB/ECCB meeting in Glasgow, UK, in July.
Van Heusden told BioInform that the validation effort began very recently, so “we’ve haven’t gotten far enough to get the big picture” regarding the overall quality of the BioPerl code base. Nevertheless, based on an initial evaluation of some widely used features such as the SeqIO module for parsing different file formats, he said that the package appears to be fairly robust. “We’re finding plenty of things that work,” he said. “The problem is the things that don’t work.”
Most glitches have been minor so far — typos that lead to unexplained exception errors, for example — but van Heusden noted that even seemingly trivial failings can lead to considerable delays in a production environment. Even the most mature open source bioinformatics software lacks a crucial feature found in most commercial packages: a help desk. Users who encounter a problem are forced to either find (and fix) it themselves, or hope for a timely answer via the project mailing list. Van Heusden said that commercial firms — who “don’t want their employees spending their time on software development and support” — are unlikely to embrace open source bioinformatics tools without some kind of “guarantee that the software works as advertised.”
EG’s goal is to provide that guarantee in the form of detailed documentation that describes how the existing code should work, along with a validation suite to test how well the software is actually performing. The documentation will be offered to EG’s customers as a commercial product, and the validation suite will form the basis of the company’s commercial validation services offering. While these products will not be placed in the BioPerl repository, van Heusden said that EG does plan to contribute bug fixes and code additions to BioPerl as the project progresses. The company plans to use the same validation framework for other open source bioinformatics projects, such as BioJava.
Heusden said that EG has encountered “concern” among commercial pharmaceutical firms who are interested in using open source tools, but wary of the risk of software failure. In addition, he said, as the fields of bioinformatics and clinical informatics converge, the validation step that is currently required for clinical trials software development may soon be necessary for bioinformatics applications as well — particularly following the FDA’s recent guidance on pharmacogenomics and microarray data.
Following this trend, van Heusden said that EG is using the FDA’s definition of software validation as a guideline for the project: “confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.” The challenge, he said, is that the collaborative nature of open source development rules out “formal” software validation procedures like TQM and ISO 9001. The “permanent beta” status of most open source projects, combined with a tendency for some components to be used far less often than others (which renders the “many eyes make all bugs shallow” development approach ineffective), makes open source software validation a “scary” process, he said. Nevertheless, EG is tackling these challenges with hopes that commercial users will turn to the company as “the interface between industry and the development community.”
In that regard, EG has already established itself as a strong supporter of open source development. The company sponsored the first bioinformatics “hackathon” in 2002 [BioInform 03-11-02], and maintains close ties with the core BioPerl development team.
Ewan Birney of the European Bioinformatics Institute confirmed that the “relationship is very good” between EG and the BioPerl core developers. “It’s great what they’re doing,” he added. While the company’s work on improving the code base is helpful, he said, “the more important piece is the social aspect.” Commercial firms who opt for EG’s services will have the same support — and “screaming rights” — with open source software that they now have with commercial packages, he said.
EG CEO Tania Hide said that the company is currently in discussions with several pharmaceutical firms regarding its software validation services, but has not yet closed a deal.