In 2001, Joyce Mitchell took a sabbatical from the University of Missouri-Columbia’s department of health management and informatics to help the National Library of Medicine plan a new information resource to present disease-related data from the Human Genome Project to clinicians and the general public. The resource, the Genetics Home Reference (http://www.ghr.nlm.nih.gov/), came online in May 2003, and was developed to help consumers answer their health questions using information from publicly available genome databases and other biomolecular data resources. BioInform spoke to Mitchell recently about the challenges of making this stream of information — which many experts have a hard time navigating — easily available to the lay community.
How did you get involved in this project?
Dr. [Donald] Lindberg, who is the director of the National Library of Medicine, had invited me to work at the NLM as part of a sabbatical, and asked me to help create a consumer health website that would try to explain the health implications of the Human Genome Project. He said it was really coming from his conversations with library patrons, as well as legislators and other folks who said, ‘Now that the Human Genome Project is all done, and you know the sequence and it’s supposed to revolutionize healthcare, where’s the beef? What are you finding out and how does it relate to me?’
So my background is in informatics and also in genetics. I have a PhD in population genetics, but I also did a fellowship in clinical genetics and am board certified as a PhD medical geneticist, so I had kind of the background in medical genetics as well as informatics and all of the various tools and techniques in what goes into building these databases to try and take some kind of a look at this.
So [the Genetics Home Reference] is trying to bridge the gap between a consumer health question [and genomic data]: ‘I have a specific health condition in my family. What are the genetics of that? And what can I find out about the genetics of that? And is the Human Genome Project working in my area?’ There are all kinds of questions that a person could pose, but a consumer could go to the Genetics Home Reference and start finding out a fair bit of information about a specific health condition and about genes that have links to health conditions, and get all the way to the sequence if they want. They could go into MedlinePlus, which is the NLM’s primary system for the public, and they could ask a question in MedlinePlus, get directed to the Genetics Home Reference, and then get all the way down to the NCBI resources, where they could find whatever they want about introns, exons, sequences, etc.
What kinds of challenges did you encounter in making these genomic resources, which are really geared toward molecular biologists and other experts, accessible to the layperson?
Well, the first challenge is the complexity of the data, and it’s going to be complex no matter what you do, and it gets more so and not less so. So that challenge isn’t going to go away, it’s just that you’ve got to figure out how to direct people to the portions of the data that make sense to them. And at the moment, it really takes someone who’s very knowledgeable in the field to really navigate their way around these things. It’s not for the weak of heart. Secondly, the dynamic nature of the data is not going to change either. So anybody who thinks that [a current bit of information] is a fact and it’s going to stay a fact forever has to change that mindset. People know that change happens in the world, but this is such rapid change.
The third challenge is that there are so many systems out there, and so many things, and they’re all focused on different areas. I actually think that the Genetic Home Reference is trying to link you to all these various related aspects. I see it as a bridging system — it’s trying to bridge communities and bridge resources. And I think that’s a direction of research and development that could be promoted more broadly in the field. You certainly have to have detailed systems focused on certain areas, but bridging resources could be very helpful for a wide variety of folks.
The fourth challenge is that the data and knowledge representations are not standardized. It really highlights the need for some formal training in informatics all the way down into the biological sciences and the other sciences that are involved in creating these resources. A lot of these resources grew up when there wasn’t a lot of formal training available, and a lot of the training that goes on now in the programs is being created anew, instead of understanding that there’s a bigger field of informatics, and health informatics has dealt with these issues for years and years and years, and has guidelines and some experience and some techniques that can be used. But there are a lot of things being developed de novo that are, I’m sure, very useful for the people working on them, but are in some cases just reinventing the wheel. And oftentimes when you’re working in an area having to do with standards, then you don’t need to develop new standards; you need to build on the old ones.
It also points out the crying need for a list of synonyms for all the genes. There’s no place where you can go to get all the synonyms.
So, if these obstacles make it difficult for an expert to navigate these resources, it must be pretty much impossible for a layperson.
Oh yeah, it’s impossible. It’s hard for everyone, because you may be an expert on your specific set of genes, but there’s nobody who’s an expert on all the genes.
Does the Genetic Home Reference use an automated system of links among these different resources?
Yes, but not everything is automated. It’s done by a small group of people who have content expertise and another group of development people who are pulling together all of this data from all of these various links and resources, and as we go forward, we do more of it on an automated basis.
How do you account for the complexity of the data, and the rapid change, and the rest of the challenges you identified?
Well, we can’t keep up with all of them. We started just with a small group and are just trying to figure out how we can ramp up to all of the health implications that are out there, and it’s going to take a long time to get all of that together. So it just claims to be a subset, and it’s growing and we’re working at it. There’s no way that we can keep up with all of those things. We are putting together some tools and techniques that are helping to alert us for when changes happen in the underlying databases. So, for example, when HGNC [HUGO Gene Nomenclature Committee] changes a gene name, then we have a system that will send us an alert that says the gene name is changed. We check once a week all the gene names that we use, and if a gene name is changed, we’ll go in and change the system accordingly. So we’re kind of doing things like that as we go forward in order to use tools to help us keep up with these underlying changes in all these databases.
How large of a staff do you have working on this?
We have about four content people and five systems people, so less than ten.
Do you have any sense of how many users you have so far?
Thousands, but I don’t have any specifics. We were tracking some data, but I haven’t looked at it in a while. We started last May, and it was growing, and it certainly is not at the level of some of the resources that are at the NLM and NCBI, but it gets more traffic all the time.
Do you have a way of tracking what portion of users are consumers and laypeople rather than healthcare professionals or others with some medical or biology background?
I’m sure it’s a combination, and there isn’t any way to tell whether they are consumers or not. But I’m sure it’s a combination of health professionals and consumers. We do get things directed through MedlinePlus, which is primarily consumer-side. But health professionals find it quite useful.
Tell me about how you’re using ontologies to improve the navigation between these resources.
We are actually using the Gene Ontology. We don’t use the full gene ontology, but we are using the higher-level portions to help with browsing capability. So if people needed to go in and browse by health conditions, then we use the MeSH [Medical Subject Headings] hierarchy to do that, the upper levels of MeSH. Then we felt like we needed to let them browse by the genes, too, so the best thing around is the Gene Ontology; so they can browse by looking at various levels of these hierarchies that are in the Gene Ontology — the biological processes, and genetic function, and cellular locations. So if they wanted to go in and find all the genes that are inhibitor enzymes, then they could do that. And they could look to see what kind of health implications that specific set of genes has.
What are your immediate plans for improving the resource, or encouraging others to improve the underlying resources that it draws from?
Well, I think the underlying resources just get improved as people bring to their attention the fact that there’s something they would like to do that they can’t do. So as we build the Genetic Home Reference and we work with all of these other systems in order to interact with them and interlink our systems, then we can kind of pass our ideas to them, and then they give us ideas too, on how we can improve our system.