NEW YORK (GenomeWeb) – Researchers in Switzerland have explored the use of encrypted genomic data for computing and reporting clinical genetic test results in order to help protect the privacy of patients.
In a pilot study published online last week in Genetics in Medicine, the team, led by scientists at the École Polytechnique Fédéral de Lausanne and the Swiss Institute of Bioinformatics in Lausanne, tested the implementation of privacy-preserving genetic testing and reporting for 230 HIV-infected patients,.
Using strategies that allow for the secure storage and analysis of large-scale genetic data and for the targeted delivery of subsets of test results to clinicians "will become increasingly important as many large-scale sequencing efforts are initiated, with the goal of incorporating the resulting genomic data into clinical care," they wrote.
Because individuals can easily be identified from their genome, genomic data requires extra layers of protection and security. On the other hand, the data needs to be accessible to provide doctors and researchers with the genetic test results they need.
But just as a shopkeeper who takes your credit card for payment does not need access to all your banking information, a doctor does not need to know about your entire genome, and sometimes not even the specific genotypes underlying a test result, Amalio Telenti, a senior author of the study and former University of Lausanne researcher who is now a scientist at the J. Craig Venter Institute, told GenomeWeb.
"We provided the participants of this experiment with a theoretical code, like your credit card, and they could release by that means the data that they felt was needed by the doctor," he said. "The idea is that you control the genome long-term, that you give access only to the people you want, and that these people, or the institution, can only know what you feel they need to know."
The publication, he said, is likely the first application of homomorphic encryption — a type of encryption that allows for computation of the data — for clinical genomic testing. "The data is accessible for queries, for example to calculate my Alzheimer's risk or cardiovascular risk, or my ancestry, but the actual data behind is not visible, it's encrypted," he explained.
For their study, the researchers recruited 230 HIV-infected patients undergoing antiretroviral therapy, and genotyped them for more than 4,100 variants. Next, they encrypted these variants through homomorphic encryption, which took about 12 minutes per patient, a time that could be reduced by parallel computing.
They then computed a number of genetic tests on the encrypted data that assessed potentially actionable HIV-related variants. These tests addressed, for example, abnormal drug concentrations and toxicity, HIV- and treatment-associated metabolic disorders, HIV and hepatitis C virus coinfection, and predictions of disease progression, and included both deterministic information — for example abacavir hypersensitivity linked to the HLA-B*57 allele — as well as risk information.
Because the predictive markers used on some of these tests have only been validated in a European population, the researchers also needed to infer patient ancestry from the encrypted data.
In total, they tested for 17 traits, with the number of informative SNPs for a single trait ranging from one to 22. All but two patients had at least one positive result, and all tests could be performed and reported within less than a second per patient, they wrote.
Clinicians received a report with interpreted test results instead of the patient's raw genetic data. When no significant result was found, the report simply stated "no relevant alleles found." Physicians also received access to the testing framework, allowing them to evaluate the evidence underlying each test result if they were interested.
Physicians also received a questionnaire asking them about the utility of the report. Slightly more than half said the test results were useful or potentially useful, but only 42 percent said they would discuss the results with their patients. Surprisingly, in cases where a genetic test suggested a medication was contraindicated but the patient had been prescribed that drug already, only 10 percent of physicians said they would have prescribed a different first-line treatment if they were given the genetic results in advance.
Scaling this approach to more than a few thousand variants would slow it down significantly — for example, encrypting the 4 million variants that are typically found in a person's genome would take about 200 hours — but this would needed to be done once and could be sped up by precomputation and parallel computing, the scientists wrote. Also, while complex operations on encrypted data, like genotype imputation, are currently computationally limited, they may become possible in the future as computational efficiency improves. And according to Telenti, community challenges, like the one run by the Integrating Data for Analysis, Anonymization, and Sharing (IDASH) center, continue to push the boundaries of computing encrypted genomic data.
The field of genomic privacy is relatively young but quickly evolving, he said. Right now, much of genomic data is stored unencrypted behind the firewall of a clinical laboratory or medical institution, and while it may be encrypted when data is transferred, it is usually not at the source or for the recipient, he said. This means that anyone who manages to breach the firewall can access the data, and those inside — which can be a large number of people — can potentially access the data. "Why should the pharmacist know more than your pharmacogenetic markers? Why should they know about your Alzheimer's risk or your ethnicity?" he asked.
Besides restricting unwanted access, encryption would also make it easier for institutions to query each other's data without sharing the raw data, for example for projects like the Matchmaker Exchange, which aims to connect databases with genomic and phenotypic patient information. "You can probably speed up communication between many institutions with minimal risk for reidentification or loss of data," he said.
And while the genomics research community may not be all that interested in encryption approaches because "it complicates life," he said, clinical clients getting into genomics may demand greater levels of data protection from providers of sequencing data. "And this is not necessarily going to be provided by a firewall for something like a full genome that will be sitting on computers for years and years," he said.
The Swiss team is now developing the approach described in the paper for use in a biobank at the University Hospital of Lausanne, as well as for the Swiss HIV Cohort Study.