By Vivien Marx
This article was posted on March 10.
The World Wide Web Consortium's Health Care and Life Sciences Interest Group and members of the National Center for Biomedical Ontology are wrapping up work on the first version of a "patient-centric" ontology for translational medicine.
Elgar Pichler, a member of the W3C HCLSIG and a co-developer of the Translational Medicine Ontology, discussed the project at the Conference on Semantics in Healthcare and Life Sciences held last week in Cambridge, Mass. In his talk, he said that the TMO stands to help the pharma industry link the early-stage, pre-clinical, clinical, approval, and post-approval phases of the drug-discovery pipeline and answer the "hundreds of questions" that pop up during development as new findings are generated.
"Communication across the pipeline is absolutely essential," Pichler said. An ontology dedicated to bridge those areas can help solve some of the challenges related to the plentiful data sources in drug discovery.
Pichler is an independent consultant and formerly a group leader with AstraZeneca. Other members of the W3C HCLSIG include researchers from AstraZeneca, Biogen Idec, Boehringer Ingelheim, Daiichi Sankyo, J&J Pharmaceutical Research and Development, and Pfizer, as well as academic researchers and consultants.
Susie Stephens, co-chair of the W3C HCLSIG, told BioInform during the conference that while there are many domain ontologies, such as the Gene Ontology, as well as a number of healthcare-related ontologies, "there is nothing that pulls those things together, which is what really is needed for translational medicine."
"I think the same body of interconnected data is needed for pharma as it is for physicians," in that it must span discovery, clinical trials, healthcare, and then have different interfaces depending on who is working with that data, she said.
Stephens, director of biomedical informatics in J&J's Pharmaceutical R&D division, said that the first version of the Translational Medicine Ontology has just been completed. "We're writing up the work and hoping to get it out there," she said. "I would really like to drive community engagement as much as possible."
"We've also built an initial use case around that," which involved pulling together data about drugs, side effects, pharmacogenomics, "fake" patient data, and formulary data.
As the HCLSIG team stated in a Nature Precedings note published last August, the need for the ontology is connected to several developments in the pharmaceutical industry, including an increasing focus on personalized medicine and the trend to develop companion diagnostics for guiding therapy.
"Such translational medicine strategies require that traditionally separate data sets from early drug discovery through to patients in the clinical setting be integrated, and presented, queried and analyzed collectively," the scientists wrote. "Ontologies can be used to drive such data integration and analysis; however, at present, few ontologies exist that bridge genomics, chemistry, and the medical domain."
To hammer out the TMO, the group chose an iterative practice-based approach, Pichler explained. The group first tallied the "very diverse" group of "actors" — biologists, cheminformaticians, medical chemists, statisticians and others — and the roles they play in the drug-discovery setting in order to define the scope of the ontology.
The group is also looking at the types of questions these actors pose of the data in the course of their work, and looked at which ontologies they currently access to help address these questions.
For example, according to the W3C HCLSIG website, a statistician looking to select statistical models for analyzing data will draw on the Ontology of Biomedical Investigations or resources from the Sequence Ontology Project; while an in vivo biologist looking for previous experiments on a given target will draw on Entrez Gene or resources from the Generic Model Organism Database project; and a cheminformatician might draw on information and ontologies provided by the Biological Pathways Exchange or the European Bioinformatics Institute's Chemical Entities of Biological Interest resource.
As Pichler explained, the W3C group mapped terms that came up in specific use cases, and aligned these terms to identify "candidates" that they could whittle down to concepts that would be included in an overarching ontology. Among the resources they used are the National Center for Biomedical Ontology's BioPortal and the National Library of Medicine Unified Medical Language System.
[ pagebreak ]
Map it Up
One such use case involved Alzheimer disease and included a series of steps, from when a patient first reports symptoms to a physician through to the course of events that lead to the consideration of this patient for a clinical trial. The working group then looked at which data resources would be needed for this use case.
Once the translational medicine concepts such as targets or interventions are identified, "you start your first mapping" to the reference ontologies, Pichler explained.
To test the ontology, datasets need to be identified and the data must be "RDF-ized" — that is, loaded it into a triple store so that sample queries can be run against it, Pichler said. Resource Description Framework, or RDF, is the language for representing information in the semantic web and RDF statements are represented as triples, in the subject-predicate-object format, so that they can be collected and searched as machine-readable graphs.
On his wish list for a TMO, Pichler said, is the desire for scientists to deliver data in RDF format as they publish, "so we don't have to reinterpret their data." As he and his colleagues go through these steps of ontology development, involving scientists in this way would "make life so much easier," and would better reflect what researchers themselves mean.
Another facet of his wish list is a "super-duper mega-mapper," which could help plug holes he sees in some resources. For example, he said that the NCBO BioPortal's resources and the Unified Medical Language System do not explain how the mapping between terms is done. Pichler said that this information would be helpful to record relevant provenance data as mapping moves forward.
Pichler said he would also like to see federated queries be enabled with "access policy mediation." For example, because it is "very hard" to gain access to clinical data, researchers could assist TMO developers by working to develop policy and regulate access to the data.
Stephens said that pharma's growing interest in translational medicine is driving it to look toward the semantic web and the TMO. "That really involves being aware of data sources from different parts of the business from different industries and [the need to] connect them together," she said.
As pharma becomes more networked and collaborative, so too must there be easier access to data and facilitated ways to analyze and mine it. The TMO is intended to address this problem by avoiding current communication gaps, she said.
Another project related to ontologies underway in the W3C HCLSIG addresses collaborations between pharma and external collaborators involving compound information. This effort is still "at a very early stage," Stephens said.
Currently, Stephens noted, "there is not a standard way" to represent compound data. While there is CheEBI for small molecules "we have also have a lot of large molecules," she said.
Furthermore, while ChEBI has information about structures, "it doesn't have as much information about the structure-activity relationships as we would like it to have, or toxicity information, or information about how easy it would be to scale up production of that compound," Stephens said.
Stephens said that the working group is currently exploring the feasibility of such a project, which "would use an ontology as the schema."
Semantic technologies, which make it possible to separate the data from the schema, should make it easier to extend that ontology internally for a company. "We might not want to tell the whole world that we are interested in capturing certain sorts of types of data when we participate in a collaboration," Stephens said. "So we could have our own proprietary extensions to an ontology more easily than we could have our own proprietary extensions to a relational schema, which isn't that easy for sharing."