CHICAGO – Bioinformaticians at a Swiss teaching hospital have developed a patient-facing application to address concerns about genomic data privacy by measuring and quantifying the risk of an individual disclosing genomic data to a third party, based on the user's personal preferences.
They hope that this software, called GenoShare, will provide a basis for informed decision-making in sharing this sensitive information safely, thus unlocking some of the potential of precision medicine.
"The goal of GenoShare is to put people back into control of what data they want to share with whom and inform them about potential consequences in terms of privacy risks," explained Jean Louis Raisaro, data science and research lead for precision medicine at Lausanne University Hospital in Switzerland, and creator of the application. Raisaro introduced GenoShare at the virtual 2020 American Medical Informatics Association (AMIA) annual symposium last week.
"It's a tool that doesn't provide you with formal guarantees of privacy, but it will inform you about potential risks," Raisaro said in a follow-up interview. "It provides more transparency in the process and brings the individual to the center so he feels more empowered about this data."
Raisaro's team at the hospital — known by its French acronym CHUV — prepared a short manuscript for the Medical Informatics Europe (MIE) conference that had been scheduled for April but was canceled due to the COVID-19 pandemic. The European Federation for Medical Informatics published proceedings of MIE2020, but the AMIA session was the first public presentation of GenoShare.
CHUV has already written a technical report, available only by request, of the current prototype version of GenoShare that goes beyond the MIE paper.
Raisaro first developed the concept for GenoShare while working on a PhD at the Swiss Federal Institute of Technology Lausanne. He described the technology in his 2018 doctoral thesis and turned the idea into a prototype after joining the CHUV faculty.
Raisaro said that his group ran a proof of concept trial of the technology with about 30 patients who contributed to CHUV's biobank. Researchers were trying to assess awareness of genomic privacy issues, usability of the GenoShare system, and people's perception of whether it was useful for decision-making. He said that the test group was too small to draw any important insights from.
A more detailed pilot of GenoShare is in the works, but that will not happen until Raisaro and colleagues complete a more detailed version of the MIE manuscript for submission to a journal in early 2021. Following the pilot in the first half of next year, CHUV plans on releasing the programming code to the open-source community, according to Raisaro.
He called GenoShare a "modular framework" that simulates various types of attacks on patient genomic data.
"People are getting a little bit concerned about sharing genomic data," Rosairo said. "The idea was to build the framework to enable systematic reasoning about all these risks."
Such concerns stem from a 2008 paper in PLOS Genetics from the Translational Genomics Research Institute and the University of California, Los Angeles, which showed how easily individuals could be identified from even trace amounts of DNA in genome-wide association studies, according to Raisaro.
That research caused the US National Institutes of Health to control access to many sets of aggregated genomic data, Raisaro noted, and spawned communities of genetic privacy specialists, who developed "inference attacks" that demonstrate how potentially malicious actors might exploit genomic identifiers.
"Leakage from me will cause leakage from it to my family members," he said. "And in the genome itself, there is sensitive information because you could infer predisposition to diseases and stigmatizing traits."
For example, GenoShare determined that a patient choosing to reveal a set of 400 genetic variants related to schizophrenia "introduces a risk of almost 100 percent of leaking the value of her predisposition to bipolar disorder and her participation in a study with less than 50 people, and a risk of 60 percent of leaking her kinship with a first-degree relative who might be known by the adversary," according to the manuscript.
The hospital acts as the "trusted data custodian" for studies it runs, Raisaro said. GenoShare simply helps people understand the privacy implications of consenting to participate in trials, whether the hospital conducts them or not.
Raisaro explained that there are two types of attacker profiles: one where it is assumed that a bad actor only has access to publicly available information, and a "worst-case scenario" in which the attacker knows the genome of a relative of the test subject.
The GenoShare technology framework has three basic components: an attack simulation, risk scoring, and a recommendation based on individual privacy preferences to release or withhold the requested data. The software produces visualizations to express risk to consumers.
The simulation considers the genetic information requested, previously available data, and the background knowledge possessed by an adversarial player. Adversarial knowledge could include family relationships, genotype-phenotype correlations, summary data from other studies, and population-based allele frequencies.
Metrics for determining risk of attacks include a third party's reputation for data handling, profiles of various potential attackers, and the portion of the DNA sequence that needs to be shared for a specific use. The software simulates malicious attacks and breaches, and outputs a privacy risk score on a scale from zero to 100.
Individuals set thresholds on this scale for risks they are willing to accept from disclosing their data, and GenoShare sends an alert if a particular risk exceeds the threshold. Individuals are then free to choose whether to participate in a study or not based on the GenoShare recommendation.
CHUV ran its initial simulation by testing GenoShare algorithms against genomes in the 1000 Genomes Project Consortium database. Raisaro said that he intends to stick to public databases for the upcoming pilot, but his hope is that the software will be able to make accurate risk assessments with any type of genomic dataset.
He said that the 1000 Genomes Project data is helpful to build profiles of potential adversaries. "The adversary has all the access to publicly available data, so this data has to be taken into account by GenoShare for inference attacks," Raisaro explained.
As it stands now, GenoShare is a web-based app that has been optimized for desktop and mobile use. Raisaro said he is considering several types of future deployments, including through an application programming interface to embed it into commercial direct-to-consumer genetic testing platforms such as 23andMe and Ancestry. Neither company is currently involved in the development.
Understanding consent would be particularly important in a DTC environment because such companies often share data with pharmaceutical companies, according to Raisaro.
"This can also be a tool that [a DTC genetic testing] company could use for enabling transparency and control for the users, because right now it's a yes or no decision. It is completely binary. Ether you share or you don't share," he said.
GenoShare would allow users to share specific portions of their genomes based on their risk tolerance for each research opportunity. "This could reinforce trust between the customer and this kind of service provider," he said.
He envisions GenoShare not as a DTC app, but as something that an intermediary such as a hospital, contract research organization, or testing laboratory offers to patients.
The development is related to a wider health privacy effort in Switzerland called Data Protection in Personalized Health (DPPH) and a national data interoperability program known as the Swiss Personalized Health Network. DPPH is a four-year, CHF 3 million ($3.3 million) undertaking to address privacy and security concerns related to the sharing of health information.
As part of those programs, CHUV informaticians are also developing a federated, privacy-focused analytics platform called MedCo, based on cryptographic techniques including homomorphic encryption and secure multiparty computation. Raisaro said that his team is testing that technology at several Swiss university hospitals and hopes it can be deployed nationally for oncology data next year.