The Asia Pacific Bioinformatics Network is bringing together a number of industry partners to build a grid-based system to share bioinformatics applications and workflows throughout the Asia Pacific region. The testbed project, a brainchild of Tan Tin Wee, secretariat of APBioNet and an associate professor at the National University of Singapore, is built upon an application integration platform provided by KOOPrime, a Singapore-based software company. Other partners so far include Lion Bioscience, which will provide its SRS data retrieval technology, and Cray, which has been developing or adapting bioinformatics software for Cray supercomputers in collaboration with NUS.
The pilot project might lead to a first version of what Tan described as a “BioWorldWideWorkFlow” that would be available to the 400-member APBioNet community. Not only will it enable them to share computational resources and applications, but also ways to connect applications or instruments. “It’s like the web. This time, we share workflows, and we share idle CPU cycles and applications, and the way we use our applications as encapsulated in our workflows,” said Tan. The aim is to increase the throughput of research: “Too many people have been filling too many web forms over and over again.”
Tan is planning to present and demonstrate the architecture to grid users, educators and bioinformaticists at various upcoming conferences, starting with the Pacific Rim Application and Grid Middleware Assembly in Korea this month. Beta-testing by end-users as well as other industry players will begin after he gathers some feedback. “We feel that this way forward, we will not be stuck at the level of arguing about standards, but actually engineering the framework and really testing out what works and what doesn’t; because right now, nobody knows what will work in the world of interoperable systems,” Tan said.
Protocols and specifications currently being developed by the Interoperable Informatics Infrastructure Consortium and others are prerequisites for the workflows, he said, and he is hoping to integrate them “as soon as we are clear what their standards are.”
At the heart of the project is NUS spin-off KOOPrime, whose KOOP (knowledge object-oriented programming) application integration technology was first developed in 1996 for a bioinformatics project involving the university’s Institute of Molecular and Cell Biology and the Glaxo-Wellcome-funded Center for Natural Product Research in Singapore. Members of the university’s bioinformatics center developed interfaces connecting CNPR’s high-throughput drug screening equipment. Launched in 2000, KOOPrime is now targeting primarily the life sciences for the commercial version of its software. Tan said that he persuaded KOOPrime to develop a ‘light-weight’ edition — KOOPlite — for the testbed, which users will be able to download free of charge to view the workflows others have published and create their own from available applications.
The KOOPrime software wraps data sources and applications with what the company terms “bubbles.” These objects can then be linked to form data capture and analysis workflows. So far, Tan and others have written over 200 bubbles — including 160 for Emboss, 40 for Phylip, 20 Unix utilities, 12 for MySQL, and others for Globus, Blast, Fasta, and ClustalW — that they plan to build into prototype workflows.
Eventually, “users can download bubbles that others published, and incorporate them into their own workflows, or even share their workflows about how they built and maintained a database, or performed a calculation,” said Tan. These workflows can be sent to the grid layer for execution.
KOOPrime sees the project as a welcome opportunity to test its software on a grid. Anwar Chan, the company’s VP of business development, said KOOPrime chose bioinformatics as its testing ground “for the complexity of the data records.”
Lion, Cray on Board. Will Others Follow?
In addition to KOOPrime, Lion Bioscience and Cray have so far agreed to be part of the testbed. The project provided an opportunity for APBioNet to collaborate with Lion, Tan said, after they had tried unsuccessfully to organize training workshops for APBioNet members on SRS for several months. For Lion, the integration of its product into the testbed “increases the visibility for SRS,” said Thure Etzold, chief architect of SRS, and gives the company a foothold in grid computing. “If it works, then this is something that we would like to use ourselves for building our solutions.” While APBioNet is most interested in using SRS as a data retrieval system, “the entire grid system and the workflow system is something that could be integrated in SRS,” Etzold said.
Cray has been collaborating with the National University of Singapore for several months, both on an SV1-based BioCray grid computing network, and on a Cray Bioinformatics Library, which contains a number of subroutines that can be utilized by biologists to speed application software running on Cray systems. One of the collaboration’s research objectives has been to show how these subroutines can be easily accessed via workflow software that ties together databases, computational resources, and end-user biologists.
Like Lion, Cray’s incentive to work with APBioNet is to expose the company to a wide audience in Southeast Asia, he said.
Tan welcomes the involvement of additional industry players in the project. Future partners, he said, could include GeneticXchange, a 1997 spin-off from NUS that provides database integration systems, and maybe Sun and Compaq.
The testbed gives commercial partners “the chance to work with us to realize an interesting way of looking at how bioinformatics can be done over the grid, over the internet. And it’s just something that possibly nobody has tried before on that scale,” Tan said.