The organization that brought you HTML, XML — and pretty much every other acronym that makes the web function — may play a big role in the next generation of bioinformatics tools. As the World Wide Web Consortium’s semantic web development effort gathers steam, the W3C is strengthening its ties with the life science informatics community, which it has identified as an important early adopter of semantic web technologies.
The feeling is mutual. The Interoperable Informatics Infrastructure Consortium recently signed on as a formal member of the W3C and also agreed to sponsor John Wilbanks, previously CEO of InCellico, as an “I3C fellow” on the W3C team to act as a liaison between the life science informatics and web development communities.
Both sides stand to benefit from the arrangement. Close ties with the W3C will certainly lend additional weight to the I3C’s standards efforts, which have until now been slow to win widespread support in the life science informatics community. Conversely, W3C views the life sciences as a key driver for its emerging semantic web technologies. According to Eric Miller, W3C’s activity lead for the semantic web initiative, “The life sciences community may be to the semantic web what the physics community was to the original web, with Tim Berners-Lee working with CERN and the physicists there to put together the early web infrastructure.”
Andy Palmer, senior vice president of operations at Infinity Pharmaceuticals and president of the I3C, said that the I3C and the W3C have been “working very intimately together” for the past six months or so. Added Miller, “It’s actually been a very interesting symbiotic and mutually beneficial exercise so far, in working with some of the leaders in the life sciences community and trying to see if their requirements match up to the kinds of technologies that we’re doing. And in turn, we benefit from learning how these are used in real-world examples.”
Joining the W3C made sense, Palmer said, because “many of the technologies that the I3C had identified as necessary in order to improve interoperability in the life sciences were being developed over at the W3C” — a fact driven home when he first suggested W3C membership to some I3C participants. “It turns out that a lot of the people who were working on I3C stuff were also working on these other activities at the W3C,” he said. There are also logistical advantages, he noted: “It was the consensus of the board of directors of the I3C that many of the processes that we were considering in terms of standard setting and information dissemination were redundant with many of the processes that already existed at the OMG and W3C,” he said. “So there was a real commitment on the part of the board not to re-invent the wheel, and to leverage the infrastructure and administrative processes that were already in place at these other organizations to ensure that we weren’t creating additional administrative infrastructure.”
Phase 2: Deployment
The W3C’s semantic web effort hit an important milestone in February, when the organization released the Resource Description Framework (RDF) and the OWL Web Ontology Language (OWL) as W3C recommendations. Since then, Miller said, “literally a flood of tools and applications and demonstrations came out of the woodwork.” Now, the initiative has entered what Miller called “Phase 2 of the semantic web activity. The primary focus of this phase is, in short, deployment.”
This step in the semantic web’s development will require identifying real-world applications for technologies like RDF and OWL, Miller said. As the HTML of the semantic web, RDF provides a framework for describing objects with semantics — or meaning — in addition to syntax. RDF can be used to define relationships between objects, especially when deployed in conjunction with OWL, which offers a standard language for defining ontologies for particular domain areas. But these specifications — while promising — are of little use without search engines, browsers, and other web-based applications that can put them to work.
These tools are just starting to come online, and the life science community has provided some of the first examples. MIT’s RDF-based Haystack browser project [BioInform 07-28-03], for example, has released the so-called BioHaystack browser for biological data that uses the I3C’s LSID identifier specification. In addition, Lincoln Stein of Cold Spring Harbor Laboratory has begun a new project under the BioMoby initiative called S-Moby, for Semantic Moby. This project, according to co-PI Damien Gessler of the National Center for Genome Resources, is based on the concept that “the heart of the integration problem in bioinformatics is not the transfer of information back and forth — it’s the meaning of what you’re sending back and forth.” A preliminary version of S-Moby is available now, Gessler said, adding that a version “more for public use” should be available in about six weeks. (See below for a complete list of life science semantic web projects.)
I3C fellow Wilbanks said that in addition to encouraging projects such as these, his primary role at W3C is to “identify what are the real-world applications for the W3C’s semantic web technologies in life sciences.” The pharmaceutical industry could see short-term benefits from the semantic web in “basic knowledge management and tracking,” he said, noting that pharma currently has no way to track its decision-making process in a reusable manner. But RDF, he said, “is a very easy way to write down statements that could easily tell you why you decided to move a target forward, and that could be everything from the intellectual property characteristics of the target, to the druggability characteristics of the target, to where the target sits in a biological network … Once you’ve made that decision, you can track back and see what was a good decision and what was a bad decision.”
The second promising area that Wilbanks has defined for semantic web technologies in life science informatics is in the area of biological networks, “where you really need to have some sort of evolvable, network-driven approach” to represent the constantly changing state of the data, and the interactions between particular data points.
“What I’m really seeing is a groundswell of activity in this area that even three months ago I wasn’t aware of,” Wilbanks said. Nevertheless, much work needs to be done before semantic web elements become a common tool in bioinformatics development. One barrier, for example, is the lack of data in RDF or OWL format. Unless large data resources start providing their information in a form that semantic web tools can read, the present flurry of tool development will be in vain. Despite a few notable steps in this direction — Affymetrix and the UniProt database have begun to release some of their data in RDF — there are “extremely few” databases available in semantic web-ready format, Gessler said. Palmer agreed, placing NCBI, MDL, Thomson, and the Chemical Abstracts Service at the top of his wish list for organizations he’d like to see “embrace fundamental standards like RDF and OWL.”
Palmer said the I3C and the W3C semantic web group are planning to hold a workshop in the fall to ensure that LSID and other I3C projects are in line with the semantic web’s technology roadmap. “It’s in the context of making sure that any work we do is very highly leveraged,” he said. “We don’t want to come back three years from now and do it all over again.”
Thinking of giving semantic web technology a try? Here are a few places to start:
General Semantic Web Resources
- W3C’s semantic web page: http://www.w3.org/2001/sw/
- Cwm (a general data processor for the semantic web): http://www.w3.org/2000/10/swap/doc/cwm.html
- The European Union’s REWERSE (Reasoning on the Web with Rules and Semantics) project: http://rewerse.net/
- Jena (HP’s semantic web framework for Java): http://jena.sourceforge.net/
- IBM Semantics Toolkit: http://www.alphaworks.ibm.com/tech/semanticstk
Semantic Web Life Science Resources and Projects
- W3C’s public mailing list for the semantic web for life sciences: http://lists.w3.org/Archives/Public/public-semweb-lifesci/
- REWERSE bioinformatics working group: http://comas.soi.city.ac.uk/rewerse-a2/
- BioHaystack (Semantic web browser for LSID-accessible biological data): http://haystack.lcs.mit.edu/staging/eclipse-download.html
- Semantic Moby (S-Moby): http://biomoby.org/S-MOBY/doc/Design/S-MOBY_Design_Overview.html
- Preliminary RDF version of UniProt: http://www.isb-sib.ch/~ejain/rdf/
- Affymetrix data in RDF: http://www.affymetrix.com/community/publications/affymetrix/tmsplice/index.affx