NEW YORK (GenomeWeb) – Organizers of the Integrating Data for Analysis, Anonymization and Sharing (iDASH) challenges have announced the winners of the most recent iteration of the community challenge which aims to evaluate the performance of methods of computing genomic data securely in remote environments.
Specifically, a two-member team from Vanderbilt University won the first of the three challenges which focused on developing stronger security for beacon queries. Then a six-member team comprising researchers from IBM, Cornell University, and Bar-Ilan University won the second challenge which called for algorithms for more secure searching of patient data across organizations. Lastly, a seven-member team from Microsoft Research won the third challenge which called for methods of searching homomorphically encrypted genomes stored on public clouds.
The results of this year's challenges were presented during the iDASH workshop held last month in conjunction with the American Medical Informatics Association's Annual Symposium in Chicago. The challenges, which have been held since 2014, are run under the auspices of the iDASH Center at University of California, San Diego. IDASH is one of the National Institutes of Health's National Centers for Biomedical Computing. Previous iterations of the challenge have asked community members to develop homomorphic encryption protocols for encrypting data used in genome-wide association studies as well as distributed cryptographic protocols for encrypting and comparing datasets.
More than 50 teams from 13 countries registered for this year's competition, Shuang Wang, an assistant professor in the biomedical informatics department at UCSD and one of the challenge organizers, told GenomeWeb.
The organizers received 17 submissions for the three challenges that were issued for this iteration of the contest. The first task, dubbed the privacy-preserving dissemination challenge, called for participants to develop solutions to protect genomic data while it's shared using beacons. These are servers installed locally by institutions to which external users can send simple queries in the form of 'yes or no' questions. The motivation for this particular challenge grew out of a paper published last year by a pair of researchers at Stanford University School of Medicine. That paper demonstrates a technique for potentially re-identifying individuals using the beacon querying mechanism which was originally set up to enable anonymized data sharing between parties. "This is a very significant problem and there are a lot of potential risks so this motivates us to develop a protection method that can protect the beacon framework," Wang said.
A second challenge, called the secure multiparty computing challenge, expands on one of the challenges proposed last year. It asks for secure protocols that let researchers perform sequence similarity searches on datasets stored at disparate institutions. The most recent version similarly asked participants to come up with methods for identifying like patients in disparate datasets but this time it focused on measuring the edit distance between query sequences and sequences contained in third-party databases rather than Hamming distance which was the focus for last year's challenge, Xiaoqian Jiang, an assistant professor in the biomedical informatics department at UCSD and co-organizer of the challenge, explained to GenomeWeb. "This challenge asked teams to contribute solutions [that] satisfy the most rigorous security and privacy guarantees but do this matchmaking in an efficient way."
The third challenge, called the secure outsourcing challenge, focuses on methods that allow data owners to encrypt data and outsource the computation and storage to public clouds. Specifically, for this challenge, participants were asked to develop methods for calculating the probability that patients have a genetic disease. These methods will allow researchers to encrypt patient data, and store it on the cloud where it can be searched for potential disease biomarkers.
The idea would be to outsource the computation of these datasets to public clouds such as Amazon enabling researchers to scale up their compute resources as the size of their datasets grow, Wang explained. NIH's genomic data sharing policy allows researchers and institutions to use public clouds for genomic data analysis but they are responsible for ensuring the security and privacy of the data. "Using homomorphic encryption methods, genomic data can be encrypted during data transmission, storage, and computation," he said. For the third challenge, the organizers simulated a cloud environment using the iDASH infrastructure but the protocols can be easily exported to Amazon and other public clouds, Jiang said. They could be testing the protocols on public clouds next year, he added.
The iDASH organizers have begun planning next year's challenges. They are currently working with the GA4GH to plan a challenge that will focus on methods for securely matching patients across disparate resources. With so many institutions participating in the alliance and contributing datasets, there is a chance that some patient entries are duplicates across institutions which can bias results, Jiang explained. So the challenge could potentially focus on creating methods for linking patient data across resources without compromising the privacy and security of the data, he said. The second and third tasks are still being developed at this time but could likely include at least one challenge involving hybrid hardware, Wang said. The organizers plan to announce the new tasks sometime in March 2017.
The organizers also plan to publish details of the winning methods in a special issue of BMC Medical Genomics that they expect will be published early next year, Jiang said.
The 2016 iDASH challenge was supported by a grant from the NIH's National Human Genome Research Institute. Wang and Jiang are both named as co-principal investigators on the grant. Those funds were used to provide six travel awards for participating students. The contest also received support from Human Longevity and Genecloud which provided cash awards for the winning teams. The aforementioned winners all received cash awards of $500 while several runners-up received $200 apiece.