MyGrid just may be the most buzzword-compliant project in bioinformatics. Launched in October 2001, the UK-based project promises to marry grid computing, semantic web technologies, web services, and agent-based software engineering. The goal is a comprehensive platform that will not only harness distributed computational resources, but will also seamlessly link databases, analytical tools, and free-text information from the literature.
The ambitious three-year project has already earned the support of several pharmaceutical companies who view the effort as an important step in the struggle to manage the growing tangle of distributed data, analytical, and computational resources.
MyGrid’s plan to combine the semantic web and the grid “really addresses a lot of the technical issues that we feel are important for our industry,” said Robin McEntire, director of ontologies and agent systems at GlaxoSmithKline. The adoption of this new platform, McEntire noted, falls in line with what GSK foresees as “a significant change in the industry over the next five to ten years. …We’re going to find best of breed, and maybe several different variations on best of breed, for algorithms and for data services, for picking up data from databases. I think that we’re going to start thinking more in a service mentality.”
Looking ahead to that service-based model, not only GSK, but AstraZeneca and Merck KgaA have also pledged to support the development of MyGrid, which is led by Carol Goble’s group at the University of Manchester. Four other UK universities, as well as the EBI and five additional commercial partners (IBM, Sun, GeneticXchange, Network Inference, and Epistemics), round out the consortium.
While not contributing any specific technologies to MyGrid, the pharmaceutical firms play a central role in the development process. Tasked with providing use cases and evaluating the ongoing efforts of the project in real-world production environments, the firms serve as “the ties to earth” for the project’s lofty goals, according to Luca Toldo, a bioinformatics scientist at Merck KgaA who sits on the MyGrid industrial board. The MyGrid developers “need a user specification like any good IT development project, but getting a user specification from a biologist is impossible,” Toldo said, “so having the three of us committed to providing this feedback is a sure way for them to get this information.”
Playing a role in the embryonic stage of the technology’s development also ensures that MyGrid will mesh with the research computing needs of the industrial partners. GSK, which expects to have an initial version of the software in place by the end of the calendar year, is seeking “an infrastructure that allows us to automatically find the kinds of services we’re interested in, so that we can build systems from components in a dynamic way, where the components will be in part MyGrid components and in part GSK components,” McEntire said.
MyGrid falls under the UK’s E-Science initiative, which has awarded £120 million over three years to six different grid projects. A detailed proposal for the MyGrid project (http://mygrid.man.ac.uk/more.shtml) highlights the collaborative nature of the planned resource, as well as its ability to be personalized to meet the needs of individual researchers. The planned environment will permit researchers to construct in silico experiments, find and adapt other such experiments, have their own view of local and public repositories, and maintain a record of the location and status of tools and data directly relevant to their work. “The Grid becomes egocentrically based around the scientist,” the proposal notes, hence the “MyGrid” moniker.
The project will build upon existing open standards, including semantic web technologies (DAML+OIL), web services (UDDI), and emerging grid standards (Globus and the Storage Request Broker) to facilitate data and tool interoperability. The project is a designated early adopter of Globus’s Open Grid Services Architecture. MyGrid will also employ existing interoperability initiatives within the life sciences, such as the Distributed Annotation Service and OpenBSA. The MyGrid developers will add to these existing efforts, focusing on the design and specification of the metadata and communications protocol that will make the distributed platform appear as a single entity to the end user.
Beyond the Firewall
The project is promising for pharmaceutical research groups coming to terms with the fact that they can no longer house all their data, analytical tools, and computational resources in a single location. Noted Toldo, “It’s no longer possible to think that a scalable architecture is that of internalizing everything. We have to be able to make use of distributed resources, and so far no methods exist that allow this easily.”
Toldo added that MyGrid “has the potential to overcome this problem by an appropriate handling of the ontological problem, of the semantic integration of the data and services, in a secure and distributed way.”
McEntire added that while it would be possible for an in-house team at GSK to develop a similar services-based approach, the company would rather leave it to consortia like MyGrid “to come up with common standards so we don’t have to spend our resources building and rebuilding the infrastructure. … As a pharmaceutical company, we’re simply not interested in that. We’re just interested in having these things at our researchers’ fingertips,” he said.
Why is pharma jumping on board now, when other consortia-led interoperability efforts have received a cool reception? According to both Toldo and McEntire, one reason is that MyGrid has a better chance at achieving its goal than previous attempts by groups such as the Object Management Group. OMG’s Corba, for example, was attractive at first, but turned out to be “too rigid” and “not firewall-friendly,” Toldo said. McEntire said that while GSK remains an active member of the OMG’s Life Sciences Research domain task force, “they’ve had some problems getting adoption of a lot of their stuff, and we think that XML, the grid, and the semantic web will leverage some of the service descriptions that have been created by the OMG LSR.”
And GSK isn’t putting all of its eggs in the MyGrid basket. The company is also “looking very closely at the I3C and tracking what they’re doing,” McEntire said. “We’ve seen a few things come out of that, but we’re in a wait-and-see pattern, as we are with others as well.”
Added McEntire, “We’re not so concerned within GSK that one particular group succeed. We’re just interested in having the best group drive this forward.”
MyGrid - which expects to be this ‘best group’— plans to begin releasing its middleware under an open source license in June 2003. The project is scheduled for completion in March 2005.