NEW YORK (GenomeWeb) – Stanford University researchers have developed a way to protect people's privacy while analyzing their genomes.
Researchers must compare genomes to one another in order to sniff out potentially disease-causing variants, but that process opens people's private data up to being viewed by others. As they reported today in Science, the Stanford researchers adapted an approach employed by cryptographers and used it to identify gene mutations responsible for rare diseases affecting four different groups of patients. At the same time, they kept at least 97 percent of the participants' genomes hidden from view.
"We now have the tools in hand to make certain that genomic discrimination doesn't happen," co-senior author Gill Bejerano said in a statement. "There are ways to simultaneously share and protect this information. Now we can perform powerful genetic analyses while also completely protecting our participants' privacy."
Currently, genomic data is kept secure by restricting access to datasets to institutional users or by only sharing obscured summary statistics to outside users, the researchers said, noting that the approaches are suboptimal. They argued that enhancing data privacy would make people more comfortable in participating in genomic research.
To protect data, the Stanford team turned to a cryptographic technique called Yao's protocol, which they combined with cloud computing. Part of their method involves each participant encrypting his or her own genome — likely using a simple algorithm on their computer — into a linear vector that only says whether the gene variants being studied are present or not. That encrypted information is then uploaded to the cloud.
Researchers then access the cloud using a secure, multi-party computation to conduct their analysis. Through this, they only have access to information about variants they are studying.
"In this way, no person or computer, other than the individuals themselves, has access to the complete set of genetic information," Bejerano said.
He and his colleagues also developed three Boolean operations — INTERSECTION, SETDIFF, and MAX — to use in patient diagnoses. INTERSECTION uncovers rare variants that two parties share, while SETDIFF can be applied to affected and unaffected individuals to discard variants seen in healthy people. MAX, meanwhile, finds the gene that contains mutations in the greatest number of cases.
They applied their approach and the MAX operation to four small cohorts with rare diseases: Freeman-Sheldon syndrome, Hadju-Cheney syndrome, Kabuki syndrome, and Miller syndrome. Each individual had a private list of between 211 and 374 rare functional variants in 210 to 356 genes.
The operation found that the genes most often mutated in these cohorts were the ones that had previously been reported as the causal gene. This approach only revealed to the researchers the variants in the most mutated gene in each cohort, keeping between 99.2 percent and 99.7 percent of the cohorts' genomes private.
Similarly, when they used the SETDIFF operation to analyze data from a parent-child trio, the analysis only revealed the rare variants present in the child, not the parents. They estimated that this kept 99.6 percent of genomic data safe.
The researchers also used the INTERSECTION operation to compare 928 patients at Washington Mendelian Center and 282 from the Baylor Hopkins Center. This uncovered 159 variants present in both patient sets. They estimated that this approach kept 97.1 percent of the data private.
The phenotypes of patients with those variants could then be compared in follow-up analyses, the researchers said.
These operations were also fairly speedy. The researchers reported that they took between a few seconds and just shy of 10 minutes to complete.