Just over a year ago, at the Consumer Genetics Conference in Boston, Illumina CEO Jay Flatley demonstrated a prototype iPhone app called MyGenome that would allow users to easily browse, visualize, and share their genomic data on the Apple smartphone.
Since then, the company has demonstrated the app at several other conferences and has also developed a similar prototype for the iPad, which Flatley demonstrated at this year's CGC.
It may be a while, however, before the tool is available from the App Store. While Illumina has overcome many of the technical hurdles to presenting complex genomic data in a handheld device, the ability to interpret that data reliably is still lacking — particularly when it comes to whole-genome data.
Scott Kahn, Illumina's chief information officer, recently discussed the company's efforts in this area at the Genetic Alliance conference in Bethesda, Md. BioInform caught up with him after the conference to discuss the company's vision for enabling whole-genome visualization on mobile devices, as well as the challenges it faces in putting those tools into consumers' hands.
The following transcript of the interview has been edited for length and clarity.
For the last year or so, Illumina has been demonstrating prototype iPhone and iPad apps that will allow users to better visualize their genomic information. What is the company's timeline and vision for making these apps available to users?
Those are actually two separate questions for me. From a vision standpoint, we can see a lot of interest in trying to get genetic information down into the hands of individuals so that they can use it in a number of ways — some of it may be social, some of it may be interaction with medical professionals, some of it may be almost recreational.
The reason I say it's a vision is that there are all sorts of hurdles in place, not least of which is that the quality of information that describes the significance of various genetic differences is not well developed, and the data sources that are out there are quite limited. So you don't want to get information out to a wide range of people that is suspect, that has not been clinically validated, that doesn't have the input of a professional who can advise on consequences and risks, and all that sort of stuff.
In terms of a timeline, that's much harder to set out because a lot of these things are not going to be in our hands or in our control. We can help provide the encouragement, the motivation, but on the clinical validation side that's largely [going to be determined by the] medical research [community].
So I can't give a timeline, but I can definitely give the vision, which is that everyone would love to have access to the information such that they understand what it's telling them. And this is the gap that exists right now.
In terms of the vision, would you be looking to do this first, say, for genotyping data from chips that would then be linked to information from genome-wide association studies, or is this more about looking at whole-genome sequence data?
There are three types of data where you could see these sorts of tools might play out. One would be the chip data. They have the benefit, potentially, of being able to access all that's going on in the GWAS area, but if you think about it, that's also one of its challenges because most of these GWAS markers are not used in medical practice; they're not clinically validated. So while there's information out there, is it the right kind of information, broadly speaking, for a consumer to have?
When you're talking about whole-genome sequencing or whole-exome sequencing, again, the issue there is that for many of the things that you'll find there aren't necessarily descriptions of what it is, how important it is, what other factors are involved.
So the data content area — the interpretation part — is probably right now where the field as a whole needs to have a step-change … in terms of the technologies that allow you to do it. [It can also be viewed in terms of] the content that's out there that is validated, that is clinically relevant, and that you can [use when you] sit down with a medical professional and they can say, 'Oh yes, this means here's what you should do.'
So there's an informatics gap right now. We have the data, and that's quite reliable — that's your genome — and the gap is the information about its significance, and that's where I think a lot of effort needs to be put to fill in those holes.
Is that a question of doing more association studies? Or is the data out there and just not integrated in a way that makes it useful for annotation?
I think there's a lot that people are still learning through both GWAS and now the richer GWAS studies that people are doing with the very wide set of markers that are available from the 1000 Genomes [Project] and related [efforts]. So that is enriching the spectrum from which people can draw, but it's also the more involved discussion around validating [the associations], understanding the clinical relevance, understanding more deeply about risks and penetrance and all those sorts of factors. It's good to know what are the islands out there, if you want to think of GWAS results as land masses, but then we need to know a lot more about what's on the island, how important is that island, what's it connected to, et cetera.
It seems like it might never get to the point where people say, 'OK, we're done.' So how will you determine when there's a critical mass of acquired knowledge that would make these apps of practical use?
It definitely is incremental. So with each new study that's published, with each additional validation, you get incrementally better.
One of the things that we're doing to try to both understand it and hopefully encourage it is we have a group called the Genome Informatics Alliance that meets once a year. The goal of that is to get experts from a wide-ranging set of disciplines — definitely some within the biology and genomics area, but often people from outside those areas — to talk about what we are doing today, how it can be improved, and what are the things that might allow us to have a step-change.
For example, [technology to] mine credit card data to find patterns that you wouldn't normally see is quite relevant to how you might mine biologic and genomic data, and yet those groups don't normally talk. So the notion of this alliance is to bring those kinds of people together to inspire one another and also to have a fusion of methodology so that we can make steps faster.
If I look at the first meeting, which was last year, and then the meeting that we had this year, there was a huge shift. [The emphasis] last year was just, 'Can I get the data off the machine? Can I manage it and store it?' This year, people are very much processing and understanding the information in that data, and they're starting to look ahead to, 'How do I get more relevance in the interpretation of the data?'
So it's interesting how people's eyes have moved from looking at their feet to looking at what's ahead of them. I think that's the right step. We try to have these meetings once a year because it's a good touch point to not only see how things have changed but to kind of re-energize the group by bringing in people from different areas who are looking at things completely differently.
When's the next meeting scheduled?
There isn't a date yet, but they tend to be in the April/May timeframe.
We also benefit from what goes on at [the American Society for Human Genetics meeting]. That's a good touch point for the overall community. [The Advances in Genome Biology and Technology conference] obviously is very specific to what's going on in the sequencing world, and I think those are two things that allow us to fuse together the right kinds of groups to bring together. We brainstorm ideas with Elaine Mardis and David Dooling at [Washington University], for example. We don't want to get ourselves in a box. We don't want to be dogmatic. We'd like to be open and creative, so we take a period of one to two months just to brainstorm ideas of how best to nudge the field a little bit.
In terms of the idea of getting this data onto a handheld device, what are the challenges that are involved with presenting that data in that format, as well as for a consumer as opposed to an expert?
There's kind of a mix of things. One is there is a whole bunch of technology that has to be dealt with. One of the challenges we had with the iPad and the iPhone was, 'Can you store a genome, and the annotations that you might want to interact with, on a device like that?' The answer was 'yes'. We had to do some work, obviously.
So that was a technical hurdle that immediately got us to the question of, 'What are the right annotations to store and how would you like to interact with it?'
So at these GIA meetings, we tend to bring together hardcore biologists with [bioinformatics experts] because there is common ground — it's actually very fertile common ground — between the technologists and the biologists.
Does the traditional genome browser format, like the University of California Santa Cruz browser or Ensembl, work for these mobile devices and the intended user base, or do you need to change the whole way this data is represented?
You can definitely today use a [genome] browser on an iPad, but it's not very satisfying for people outside the field, and it's because the questions that you ask as you're browsing the data are different. You have a context, which is that you know deeply about a genome and it provides an extremely useful tool to navigate through it.
In contrast, the non-scientists know about themselves and they want to ask things that are very focused, not from the genome perspective, but on a phenotype. So they might ask a question like, 'What do I know about my statin use based upon my genome?' If you think about asking that question in the UCSC browser, you can't do it because it's flipped [the question] on its head.
So in some respects, what we've been prototyping, and why we've been prototyping things on these portable devices is [to find out], 'Are there different ways that people can ask questions? What are the right ways to ask those questions? Can it flexibly interact with the data?' And then you find that you have more than one tool, more than one visualization, dependent on who is the one viewing and the kinds of questions they want to ask.
This is a rough analogy: When you look at your banking data, when you look at your own account, you want to know what checks cleared and all that kind of stuff. But when your bank looks at your data, they want to see things completely differently. It's important [for them to know] what checks you wrote and all that, but they [also] want to know about your spending patterns and whether there's fraud going on, and so on. So even though the data behind the scenes can be derived from the exact same source, depending on who's trying to view it and what their objectives are, it's going to be completely different.
So it's not going to be a browser because that doesn't really capture likeness and not-likeness. It's kind of a completely different way of viewing the data.
Have you been getting consumer input as part of this process?
No, and I want to be careful in explaining that this is a prototype and not something that we're looking to release. We've tended to show it at groups like Genetic Alliance, and I know [Illumina CEO] Jay [Flatley] showed a prototype at the Consumer Genomics meeting in Boston, but we haven't run any formal focus groups yet. Obviously there is an enormous amount of activity in the whole consumer genomics area, so we're just being mindful of what's going on there.
[ pagebreak ]
Are you referring to the ongoing regulatory issues in the direct-to-consumer genomics industry [The US Food and Drug Administration has recently signaled that it intends to regulate these firms more closely but has not provided guidance on its thinking — Ed.]
Absolutely, and it's all very appropriate. We're very mindful that this kind of data is fundamental to the individual, but how you use it, right now, requires someone who has training to give you advice on how to read the tea leaves, if you will.
In terms of how this might play out, is it possible that these types of applications would find their first use in a doctor's office — perhaps as part of an electronic medical record?
I think there are two exciting areas that might find use quickly, relevant to the individual. One would be in a doctor's office where it would act through an electronic medical record, and that's why we built into the first app the ability to upload and download information from [Microsoft's] HealthVault, HealthVault being one of several of these kinds of records. It just seems like the right mechanism to get information to the physician.
The second area that could be very useful would be on the pharmacy side, where, again, it could be through the medical record, but you could imagine going to your pharmacist and they could just very quickly screen that the medication you're getting, and potentially the dosing that you're getting, is consistent you’re your pharmacogenomic markers. So one classic one is Plavix. 'Do you respond to Plavix or not?' These are the kinds of [questions] I think that are practical, that are probably within the knowledgebases that we have, and that would involve the pharmacist or the physician to interpret.
On the pharmacogenomics side, many of the markers are known and the consequence is known. It's just that people don't wear a bracelet that says, 'I'm CYP2C19*7.'
But you could have that information on your iPhone.
Right, you could have it on your iPhone or iPad and maybe you can share that, or maybe you go to CVS and they already have that there so their systems automatically make sure there's no possibility for mistake. Or maybe when you're taking a drug they can say, 'You really should avoid eating [certain foods].' Because there are some food/drug interactions that maybe we don't always see or maybe you don't read the label carefully enough, but it's relevant because you have certain genetic markers that put you a little more at risk so wouldn't that be nice for the pharmacist to just give that simple warning?
So those are the two areas that I imagine, and obviously there are huge hurdles to get over because there's so much going on with medical records and standardization of medical records. I think there's a huge impetus to get that evolved, and it is evolving very quickly.
Ultimately, what is Illumina's goal with these prototypes that you've developed? It sounds like this is more of an exercise to stay abreast of the bleeding edge of the field rather than something that you plan to productize any time soon.
It's definitely to participate in how the use of genomic data will evolve. It's trying to do that in a responsible way, obviously, and I think it's useful to break out of the dogma of how it's used today, and this has been a mechanism for us to do that. It gets people to think just for a moment, 'What if,' and then participate in the discussion that follows from that. Because I think that is how leaps in utilization are going to happen.
In terms of getting the data to the point where you could make it available to users in this format, what would need to happen in order to improve the available annotations?
Wouldn't it be great if what happened in the web search arena happened with medical data? Say Google decided that it was going to organize and collect medical knowledge — it would find ways to validate it, it would link it to clinical information. It's a project that sounds so outrageous that it's not possible, but that's the kind of stuff that companies like that have done. Maybe it's not Google. Maybe it's Microsoft with what they're doing with Bing or maybe Yahoo! comes back into it. It would be [an effort] to organize, to coalesce, to curate, to validate information such that it has a weight — how important is it; it has a value — whether it's significant or potentially just advisory; and to put it in a place where the data could be queried against it to extract information relevant to an individual.
So it would be more or less automated, rather than a manual curation exercise?
I think so. One thing that people have talked about is the power of group curation. It's an interesting concept and I'm not sure how or if it plays in this arena.
What are you keeping your eyes on in terms of progressing forward with this vision? Are there any specific technologies or projects that you think will bear fruit in the shorter term, or at least help advance the longer-term goal?
We follow a lot what's going on in the rich GWAS area. We follow a lot of what's going on in validation of existing GWAS hits. We also are getting involved in the technologies that might allow you to fuse information generically, [such as] generic web and service-oriented technologies that allow this to be done in an automated and open fashion. And, actually, that's a lot on our plate.