Despite the complexities associated with semantic technologies, efforts to adopt the approach for drug development are bearing fruit, according to several presentations at last week's Conference on Semantics in Healthcare and Life Sciences in Cambridge, Mass.
In a conversation with BioInform, Ted Slater, head of knowledge management services at Merck and the CSHALS conference chair, described this year's meeting as "the strongest program" in the four years of its existence.
"Four years ago ... nobody [really] knew about [semantics]," Slater said. "Now we are at the point where we're talking about ... expanding the scope a little bit [and asking,] 'What else can we add into the mix to make it a more complete picture?'"
This year's conference began with a series of hands-on tutorials coordinated by Joanne Luciano, a research associate professor at Rensselaer Polytechnic Institute, that were intended to show how the technology can be used to address drug development needs.
During the tutorials, participants used semantic web tools to create mashups using data from the Linked Open Data cloud and semantic data that they created from raw datasets. Participants were shown how to load data into the subject-predicate-object data structure dubbed the "triple store;" query it using the semantic query language SPARQL; use inference to expand experimental knowledge; and build dynamic visualizations from their results.
Luciano told BioInform that this was the first year that CSHALS offered practical tutorials and the response from participants was mostly positive. Furthermore, the tutorials were made available for users in the RDF format so that “we were in real time, during the tutorial, able to run parallel tracks to meet all the needs of the tutorial participants,” she said.
While it's clear to proponents that semantic technology adds value to data, several speakers at the conference indicated that there is room for improvement and that much of the community remains unaware of the advantages that the semantic web offers.
For example, Lawrence Hunter, director of the computational bioscience program and the Center for Computational Pharmacology at the University of Colorado, pointed out that the field is still lacking good approaches to enable "reasoning" or, in other words, to figure out how "formal representations of data can get us places that simple search and retrieval wouldn’t have gotten us."
During his presentation, John Madden, an associate professor of Pathology at Duke University, highlighted several factors that need to be considered in efforts to "render" information contained in medical documents, such as laboratory reports, physician's progress notes, admission summaries, in the RDF format.
A major challenge for these efforts, he said, is that these documents contain a lot of "non-explicit information" that’s difficult to capture in RDF such as background medical domain knowledge; the purpose of the medical document and the intent of the author; "hedges and uncertainty"; and anaphoric references, which he defined as "candidate triples where it's unclear what the subject is."
Yet despite its complexities, many researchers are finding useful applications for the technology. For example, Christopher Baker of the University of New Brunswick described a prototype of a semantic framework for automated classification and annotation of lipids.
The framework is comprised of an ontology developed in OWL-DL that uses structural features of small molecules to describe lipid classes; and two federated semantic web services deployed within the SADI framework, one of which identifies relevant chemical "subgraphs" and a second that “assigns chemical entities to appropriate ontology classes.”
Other talks from academic research groups described an open source software package based on Drupal that can be used to build semantic repositories of genomics experiments and a semantics-enabled framework that would keep doctors abreast of new research developments.
Semantic technologies are also finding their way into industry. Sherri Matis-Mitchell, principal informatics scientist at AstraZeneca, described the first version of the firm’s knowledgebase, called PharmaConnect, which was released last October and integrates internal and external data to provide connections between targets, pathways, compounds, and diseases.
Matis-Mitchell explained that the tool allows users to conduct queries across multiple information sources "using unified concepts and vocabularies." She said that the idea behind adopting semantic technologies at AstraZeneca was to shorten the drug discovery timeframe by bringing in "knowledge to support decision-making" earlier on in the development process.
The knowledgebase is built on a system called Cortex and receives data from four workstreams. The first is chemistry intelligence, which supports specific business questions and can be used to create queries for compound names and structures. The second is competitive intelligence, which provides information about competing firms' drug-development efforts, while the final two streams are disease intelligence, used to assess drug targets; and drug safety intelligence.
In a separate presentation, Therese Vachon, head of the text mining services group at the Novartis Institutes for Biomedical Research, described the process of developing a federated layer to connect information stored in multiple data silos based on "controlled terminologies" that provide "uniform wording within and across data repositories."
Is the Tide Turning?
At last year's CSHALS, there was some suggestion that pharma's adoption of semantic methods was facing the roadblocks of tightening budgets, workforce cuts, and skepticism about the return on investment for these technologies (BI 03/05/2010)
Matis-Mitchell noted in an email to BioInform that generally new technologies take time to become widely accepted and that knowledge engineering and semantic technologies are no different.
She said her team overcomes this reluctance by regularly publishing its "successes to engender greater adoption of the tools and methods." While she could not provide additional details about these successes in the case of PharmaConnect for proprietaty reasons, she noted that the "main theme" is that it "helped to save time and resources and supported more efficient decision making."
However some vendors now feel that drug developers may be willing to give semantic tools a shot and are gearing up to provide products that support the technology.
In one presentation, Dexter Pratt, vice president of innovation and knowledge at Selventa, presented the company's Biological Expression Language, or BEL, a knowledge representation language that represents scientific findings as causal relationships that can be annotated with information about biological context, experimental methods, literature sources, and the curation process.
Pratt said that Selventa plans to release BEL as an open source language in the third quarter of this year and that it will be firm's first offering for the community.
Following his presentation, Pratt told BioInform that offering the tool under an open source license is "consistent" with Selventa's revised strategy, announced last December, when it changed its name from Genstruct and decided to emphasize its role as a data analysis partner for drug developers (BI 12/03/2010).
To help achieve this vision Selventa "will make the BEL Framework available to the community to promote the publishing of biological knowledge in a form that is use-neutral, open, and computable" Pratt said .adding that the company's pharma partners have been "extremely supportive" of the move.
Although the language has already been implemented in the Genstruct Technology Platform for eight years, In preparation for it's official release in the open source space, Selventa's developers are working to develop a "new build" of the legacy infrastructure that's " formalized, revised, and streamlined."
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.