CHICAGO – Last month, Mount Sinai Health System elevated biomedical data veteran Andrew Kasarskis to the newly created position of executive vice president and chief data officer.
As chief data officer, Kasarskis, the former director of the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai's Icahn School of Medicine, is responsible for "simplification, transparency, and use of Mount Sinai's digital assets," the institution said. He also is leading efforts for the New York-based health system to adopt performance metrics to measure the success of data infrastructure improvements and various research and clinical data initiatives.
Kasarskis, who remains a professor in the medical school's Department of Genetics and Genomic Sciences, is continuing his own research into the use of technology for pathogen surveillance, pharmacogenomics, treatment of viral infections, and chronic diseases. His genomics credentials include the development of a course in which students learned how to sequence, analyze, and interpret their own genomes.
Kasarskis reports directly to Mount Sinai Health System CEO Kenneth Davis and is working closely with CIO Kumar Chatani, as well as other clinical, research and IT leaders within the eight-hospital system. He also is drawing upon resources including Sema4 — a genetic testing company spun out of Mount Sinai in 2017 — and Icahn School's newly launched Center for Genomic Health, which seeks to integrate genomic screening into primary care.
In a recent interview, Kasarskis spoke about breaking new ground as chief data officer, the challenges of wrangling all kinds of data for a large academic health system, his long history in the field of genomics, and how he is collaborating with other leaders at Mount Sinai. Below is an edited transcript of the interview.
What does the job of chief data officer boil down to?
It's all about getting your data assets in order so that you can have them accessible for people who need the information to be able to make better decisions or to design better tests, whether you're doing it for operational efficiency or research or both.
How does that differ from what other IT leaders at Mount Sinai do?
Joseph Finkelstein, our CRIO, is responsible for enabling researchers. [CMIO] Bruce Darrow is responsible for enabling clinicians. I'm responsible for working with them to make sure that from the moment we capture data in a primary repository to the moment it's sent out to [users] that we are thinking about about how to manage it most effectively for them to get the maximum value from it.
Does the molecular part of your job, the genomics part of it, distinguish what you do from the others? Certainly, genomics is being used in research a lot. It's starting to make its way into clinical care, which would be the CMIO's domain.
At Mount Sinai, we've got a history in genomics that goes back clinically I think longer than it does in research, really. As you know, we spun out Sema4 as a genomic testing company. That's our old genetic testing laboratory in the genetics department. We were doing a lot of clinical genetics and genomic testing at scale really before we did much research. We didn't do much research genomics here before [current Sema4 CEO] Eric Schadt and I showed up [in 2011]. One of the first things that I did ... was decide that our genomics facility, since you're not going to beat the Broad, WashU, Baylor, or BGI on size and scale, [would] focus on two things. One was application of technologies that were not used widely clinically or in a research context but were still evolving, niche things such as Pacific Biosciences or special sample prep things. And then actually getting really good at doing clinical sequencing. We decided to convert our entire genomics facility to a CLIA facility [from Good Laboratory Practice standards].
Sema4 now offers a leading carrier screening test and strong cancer tests and pharmacogenomics tests, and those have long been incorporated into Mount Sinai's clinical practice. Sema4 services that operation as well as many other customers both nationally and internationally. As one starts to develop a market for clinical exome sequencing and clinical whole-genome sequencing, obviously we will do more of that. But for Mount Sinai, much of that has been done in a research context, and much of that research is done in the context of our BioMe repository.
What exactly is BioMe and how are you using it?
That's at the Institute for Personalized Medicine, run by Judy Cho, and largely exploited by the new Center for Genomic Health that's co-led by Eimear Kenny and Noura Abul-Husn. That's tens of thousands of individuals with exome sequencing and it's being exploited in two ways. There's a genome-first way, which is looking at the variants and then going to the [electronic health record] and seeing what sort of associations are present. Then there is what I think really is Noura's strength, which is to look at the entire corpus of our electronic medical records for interesting variants in whatever you can measure in our population, keeping in mind we've got a very ethnically diverse population. Eimear's specialty actually is population admixture and stratification. She's adept at leveraging our enormously multiethnic population here in New York City. The goal there really is to, after correcting for all those things, look to see if there are extremes or variations in the population that are of interest, which might relate to cryptic Mendelian disease, perhaps some haploinsufficiency that was not properly appreciated. Noura is going and looking through those interesting phenotypically defined cases to see if we can find hypotheses that could be looked at for a genomic test or a simple, even locus-specific examination with a clinically validated test to identify individuals that might have previously unappreciated Mendelian causes or contributions to their phenotypic spectrums.
From a population health standpoint, there is probably a business case to be made for doing pharmacogenomics screening more broadly. We're exploring that in our health system.
You have a personal history with molecular epidemiology and pathogen surveillance. How are you bringing that experience to your new job?
For a couple of years, we have been banking every single bacteremia sample, looking at every C. diff sample as well as some prospective sampling and recently been getting all the flu samples and doing some mixture of multilocus PCR typing, Illumina genotyping, Illumina sequencing, and Pacific Biosciences sequencing. If your goal is to understand the evolution of a pathogen response to antibiotic exposure, you probably are going to want to know all the auxiliary genomes, all the plasmids. You're going to want to have it completely assembled because you're talking about trying to detect single base-pair changes or heteroresistance in a population. If your goal is simply to broadly characterize what appears to be putative transmissions either within your health system or those in the community that are being admitted to your health system, that's something which could be achieved with PCR typing sometimes or even antibiograms, but more often actually looking at the Illumina sequencing and relatively crude assemblies from that.
You're very much interested in data quality and data cleanliness. Is there anything that you're doing in the new role in that regard that differs from what you did previously?
Tons of things. We're a health system that grew by acquisition. We acquired Continuum Health Partners [in 2013] and we're now an eight-hospital system. Each of those hospitals is at a different state with regard to standardized systems both for administration and operations as well as the EMR. We're standardizing toward Epic. There's a lot to be said for administrative data in addition to actual clinical data. If you're trying to figure out gaps or care transitions or engaging with community-based organizations for our very substantial population health business, it's not actually the EHR that's the big thing there. It's actually knowing which care providers you have where, where a patient needs to go, what community-based organizations provide services they might need. … There's a lot to learn there, but I think the integration of any of that information with good laboratory results all comes down to basic data hygiene. If you have a lab result that's properly categorized and a discrete, strongly typed piece of information in the EMR and it's actually accurate, then you're in pretty good shape.
When you start throwing in genetics into that, then it becomes more complicated as well.
It does. Everyone likes to get excited about [artificial intelligence] and machine learning and things like this, but the barrier here is not machine learning. It really is not. I think it's not the barrier for almost everybody's use cases in most industries these days. You can get high school students who can take open-source machine learning packages and run them if you have well-labeled data — Group A and Group B — then the algorithms will distinguish them. It's not that hard. The challenge is to the application you have, can you actually get well-labeled data and get rid of the confounding variables that might lead to an erroneous result? Then can you actually verify that whatever data you used for your training set is actually representative of the circumstances under which you want to deploy that model? Then how would you measure the effect of that? How would you measure the fact that your model is effectively deployed or not effectively deployed?
Of course, there's a whole universe of really exciting genomics data that we know and love from the research context which are on the cusp of being used clinically. For us at Mount Sinai, multiple myeloma is probably the most advanced in terms of leveraging transcriptomics and everything else. We have had a lot of experience with immune profiling as well, which is not really genomics, but it's still omics.
Bobby Sebra and his lieutenants run our genomic technology development group. We are still obviously very savvy with biosciences. [We do] a lot of single-molecule sequencing in general. We have spent a couple of years now getting pretty good with Berkeley Lights. We have been applying that in cancer for single-cell work. That group is partnering very deeply with our multiple myeloma group to do longitudinal sampling and trying to understand at the single-cell level what's actually going on, when you need single-cell waveform, when you need bulk to monitor progress of disease and response to therapy. There is definitely a lot of interesting work to be done.
The correct balance of technologies needed to get appropriate statistical sampling in your population and enough data richness to answer the more mechanistic questions you want to answer is always an interesting tradeoff. We spend a lot of time partnering computational people with the genomics people to try and get those answers.
As an organization, we've tried to get good at threading the needle. If we want to be able to figure out whether this is a real phenomenon or not, when do you tweak the genomics and when do you tweak the experiment design and when do you tweak the algorithm and when do you tweak the statistics to actually get that most efficiently?
That's something that you're starting to grapple with now in your new job?
Absolutely. But realizing that for a lot of these things, all the experimental design in the world is not your problem. You first need to get a good list of data that is mapped through appropriate namespaces. That can be a significant issue. Life is a lot easier now that we've got as many hospitals as we do on Epic because every single clinical micro[biology] lab result is now going into a single repository. We hit that for free because we're now all on Epic. We don't have to try and map one clinical micro lab to another because we have gotten down to one micro lab.
Back to the idea of you having to learn on the job. Are you having to educate some of the medical staff and some of the research staff there about good data practices?
Less and less. In primary care, we just hired one of our medical genetics residents, Ayuko Iverson, who's going to be boarded in internal medicine and genetics. Just like Noura, she's going to be interfacing with primary care. They're really carrying the lion's share of the load of that education, actually. And genetic counselors have long been engaged in cancer, so I don't have a whole lot to do there, just because we've built that infrastructure over the years. That's not to say there's not more that needs to be done, but I think our senior executives know that that's out there. They know that people can be directed in those directions, so I don't have to do a whole lot of genomics.
I guess where I am spending a bit of time is thinking if you really want to get genomics results actionable in the EMR, how would you do that? If you want to be able to run a query that goes across our operational EMR to get genomic test results in combination with other things, you're going to have to have discrete fields that are strongly typed with particular alleles or at least [whether] there is a variant in this gene present that either we think is a known pathogenic or we think it's likely benign or whatever it is. You have to get that integrated into [your systems]. That's not something that the Epic Clarity [reporting database] or Epic's Caboodle [enterprise data warehouse] is set up to do right now.
What do you see as your biggest challenge in the world of genomics in terms of data?
You know, that's actually not clear to me. You might be tempted to say dimensionality reduction, because at the end of the day, it is still a big clinical problem to go from exome sequence to actionable clinical information. So that's certainly something. On the other hand, if you're looking at a BioMe repository, there's a lot of genomic information there that could be practically married to the EMR. There are a lot of people working on that. That's a significant bottleneck, too. Targeting the biggest challenge on the genomic data front is not simple to me.
Another potential challenge is when do you need the single-cell genomics and when is that unnecessary effort? That comes down to the question: Are you better served to have for a research study with lots of unique biological replicates or that single-cell precision on a smaller number of distinct biological samples? Depending on your question, the answer could be different. I think clarifying those questions and figuring those answers out is sometimes a bigger challenge than other things. Of course, there are always better ways to mine the data we have.
I do also think that there are looming regulatory questions there as well. The exact understanding of DNA sequencing information under HIPAA is something which will need to be thought about and is being considered by a number of regulatory bodies. Depending on how all that nets out, that would have major implications about an awful lot. That's not a genomics question at all, but it's definitely germane to how we use genomic data from a health system standpoint and a clinical value standpoint. That's not small potatoes.
If I talk to you a year from now, what would you expect to have accomplished in you first year in this role?
Well, certainly having a better sense of our inventory and what the most valuable activities we can engage in, in terms of enabling both clinical care and research would be high on the list. Some of these things are going to be pretty prosaic. Some of them actually may turn out to be quite interesting. I think a better understanding of how we translate from the good research more effectively into the clinic, I think that's an important thing. We've got a pretty strong track record with that role, with what Sema4 does, and other things. But there's a lot we can learn I think specifically in the transcriptomics area. That's going to be an important thing as well.
I think a year from now, I'll probably look back and laugh at a lot of what we've been saying. I would think that it would be good to have a stronger answer in terms of a business case for pharmacogenomics and population health. That's something that we and others would like to explore and something that we may ... pay more attention to. That's not too hard to think about. It's a fairly discrete use case.