Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: IO Informatics' Charles Mead on Efforts to Adopt Semantic Web for Translational Medicine Use


Mead-photo-2_0.jpgCharles Mead, a member of IO Informatics' scientific advisory board (BI 6/7/2013) and co-chair of the W3C Healthcare Life Sciences (HCLS) working group, has over 35 years of experience in various aspects of healthcare informatics including standards development and large-scale enterprise service-oriented architecture, or SOA, development.

Mead, who is also the director of healthcare information technology at Octo consulting group, has been involved in standards organizations such as the Clinical Data Interchange Standards Consortium, (CDISC) and Health Level 7 (HL7), serving on the board of directors of both organizations.

Currently, he focuses on the use of semantic web tools and technologies for standards and data representation across the translational medicine continuum, and in SOA to enable run-time contract interface negotiation and automated workflow configuration.

This week, BioInform interviewed Mead about his involvement with the semantic web and ongoing efforts to bring the technology into life sciences, healthcare, and clinical trials. What follows is an edited version of the interview.

Let's start off with a little bit of background.

I did 10 years as an emergency room physician and then got into computer science, [and] made my way through the standards world — particularly the health care standards, HL7, and clinical trial standards, CDISC. I've just finished a seven-year engagement with the National Cancer Institute working in [the] caBIG program and have been working in semantic web for about two and a half years as co-chair of the Healthcare Life Sciences working group. I have been associated with IO Informatics for about six months.

Why did you make the transition from physician to computer scientist?

When I was in medical school … and the first five or six years after, I was balancing working in physiologic signal processing — ambulatory electrocardiogram processing in particular — at the biomedical computer lab at the Washington University, St Louis School of Medicine and from that background … I got interested in how machines and computers work. I was able to take advantage of a program … specifically set up to cross train people with healthcare or biology backgrounds in computer science. So I went back and got a Masters in computer science as part of that program. From there, I wrote an application for anesthesiologists using early pen-based computing technologies that a venture company picked up for application in homecare and tried to turn into a company. From that experience, I got interested in how you turn ideas into commercial products. Then I spent five years in Oracle because [they] were developing their Healthcare Transaction Base on the HL7 Reference Information Model [RIM] and I had been one of the original people that worked on that. I left Oracle [after] the NCI asked if I would come and work on the caBIG project.

When did you become interested in semantics?

When I was in medical school in the 70’s, we talked about semantic interoperability because clinical information systems were starting to come into the picture. And of course, I was deeply involved in the problem after starting to participate in HL7 in the mid-90’s. It's always been a really difficult problem to solve. I spend a lot of time talking to people and explaining to them what's different about the semantic web [and] how it is different from anything else that’s preceded it. First and foremost is the fact that the semantic web tools and technologies are about semantics … and just semantics. There are no technology wrappers that obfuscate the essence of the semantics. Semantic inconsistencies and disagreements still have to be resolved by people. However, when you are dealing with semantics stored in relational databases or transported in XML documents, you've got a combination of the semantics embedded in syntactic technology structures like tables or nested trees so that every time you worry about semantic interoperability, you first have to worry about syntactic interoperability…and in that context, small non-semantic changes like differing column names for the same concept of different nesting structures for essentially the same semantics raise unnecessary barriers or result in difficult-to-scale-or-extend, one-off solutions. In contrast, with semantic web technologies, everything is about semantics. There are no syntactic hurdles per se to overcome. The semantics that you represent in the graph as you define the problem space are exactly the semantics that get serialized on the wire in the solution space. It doesn’t make the semantic interoperability issues easier. However, it certainly makes them the first class citizens they deserve to be if the goal is computable semantic interoperability.

My task as an HCLS co-chair focused on the healthcare and clinical trial spaces has been evangelizing the value propositions of the semantic web. The life sciences space has been leading in terms of [its] adoption of semantic technologies … compared to what has been done in healthcare and clinical trials. However, those domains are now starting to catch up.

Why is adoption slower on the healthcare and clinical trials side of things than it is for the life sciences?

I think that the challenges on the healthcare side are several. First, the basic models themselves are pretty complicated and compared to the instance data that is expressed. This is in contrast to the situation in the life sciences where the basic models — base pairs, amino acids, et cetera, are relatively small and fixed compared to the huge and increasing volumes of data that are emerging. So the application architectures that use semantic web technologies in life sciences look a little different from the applications … in healthcare and clinical trials.

On the other hand, when we look at personalized medicine and translational medicine continuum … and focus on how we integrate all that data … it seems to me that semantic web technologies are the only way that we are going to be able to cross those barriers simply because in the end, integration of phenotypic and genotypic data is about sharing semantics at a very granular level.

Let's stay on that last point. Where do things stand on efforts to adopt the semantic web for translational medicine use?

I think that in the last six months, there have been some real breakthroughs. HL7 has published the content of their Model Interchange Format — the RIM, data types, and HL7 vocabulary — in the Web Ontology Language [OWL]. There is an organization called Pharmacy Users Software Exchange [PhUSE] that, along with the US Food and Drug Administration, has launched a joint project to represent all of the CDISC standards in the Resource Description Framework [RDF]. I think that as we move to representing healthcare and clinical trials standards themselves in RDF, then we can of course represent the patient- or subject-level data in a semantically interoperable form using the standards as the meta models. Then you start to really be able to look at computable semantic interoperability along the entire translational medicine continuum.

Why has it taken so long?

I think it has been at least in part a lack of knowledge and understanding. I came in as the co-chair of the W3C’s HCLS Semantic Web Working Group first with serious doubts about the industrial strength viability of the semantic web tool kit. Once I’d gotten comfortable with some of the realities that it did in fact actually 'work,' [my] absolute agenda [was] to get this into the healthcare and clinical trial standards domains. It's certainly not that I was the first person to do that. Other people have tried but … I happened to be in the right place at the right time because the tools themselves were stabilizing, maturing, and gaining traction in the life sciences space. Healthcare and clinical trials are now starting to catch up.

What attracted you to IO Informatics?

I met them for the first time two years ago when I gave a talk at [the Conference on Semantics in Healthcare and Life Sciences]. I felt they understood some of the issues around semantics but I was focused on SOA and the clinical trial domain at the time and they were focused very much on life sciences. At the time, I didn’t know enough to understand how deeply and how relevant their solution could be in the clinical trial or healthcare domains. Now that I understand both the semantic web tool kit and IO’s offerings in particular at a deeper level of detail, I can see that they have a solution set that is focused on one of the biggest problems we face in getting semantic web technologies into the mainstream. When people think about semantic web the first thing they say is 'well it probably doesn’t actually work.' But when they start seeing it might work, their biggest fear is they have to rip out everything they've got and replace it with triple stores and start over. IO has focused on the technical interfaces between existing legacy structures and representations of those semantics using RDF, OWL, et cetera. In other words, how you grapple with the realities of legacy data in relational stores and staff that is much more experienced with UML or SQL than with RDF or SPARQL.

The translational medicine continuum is ultimately about semantic interoperability. Anyone who has tools that can enable that semantic interoperability will be able to find a place to play in that continuum. What I think is going to start happening is the business opportunities are going to start opening up as more people understand the value proposition and potential of semantic web tools in solving the core problems around computable semantic interoperability.

Any final thoughts?

As I said earlier, the semantic web technologies do not solve semantic interoperability problems. What they do is surface semantics as a first class citizen and I think that’s incredibly important. The fact that the models … can be built and validated by domain experts and then … be serialized on the wire … [and] the fact that the two representations are the same is a tremendous benefit because it allows you to narrow the gap that traditionally exists between the problem space representation of a particular set of semantics and the representation of those same semantics in the solution space. Historically, those representations have looked pretty different, in part because of the kind of non-semantic technology wrappers in which solution-level semantics have been packaged. Also, because the semantic web technologies are all based on solid, proven internet standards, there is a level of confidence not normally present when a 'new' technology is introduced.

Finally, semantic web technologies start with the assumption that harmonization of semantic differences can be postponed until they need to be resolved, thereby eliminating the necessity of building top-down uber-models of domain-specific semantics and relying instead on the fact that leaf-level semantic harmonization can — and will — occur in the context of the specific use cases in which computable semantic interoperability forms the core of the value proposition.

The Scan

Study Finds Sorghum Genetic Loci Influencing Composition, Function of Human Gut Microbes

Focusing on microbes found in the human gut microbiome, researchers in Nature Communications identified 10 sorghum loci that appear to influence the microbial taxa or microbial metabolite features.

Treatment Costs May Not Coincide With R&D Investment, Study Suggests

Researchers in JAMA Network Open did not find an association between ultimate treatment costs and investments in a drug when they analyzed available data on 60 approved drugs.

Sleep-Related Variants Show Low Penetrance in Large Population Analysis

A limited number of variants had documented sleep effects in an investigation in PLOS Genetics of 10 genes with reported sleep ties in nearly 192,000 participants in four population studies.

Researchers Develop Polygenic Risk Scores for Dozens of Disease-Related Exposures

With genetic data from two large population cohorts and summary statistics from prior genome-wide association studies, researchers came up with 27 exposure polygenic risk scores in the American Journal of Human Genetics.