Skip to main content
Premium Trial:

Request an Annual Quote

Intel Launches Infrastructure to Help Institutions, Hospitals Share Private Datasets


NEW YORK (GenomeWeb) – Intel is currently putting in place a general platform-as-a-service (PaaS) infrastructure that will enable hospitals and research institutions who are interested in sharing private genomic, imaging, and clinical datasets, to do so securely, without compromising the privacy of the contributing patients.

As a first step, Intel is setting its sights on using its infrastructure to support efforts to share cancer data. This week, it announced the development of a so-called Collaborative Cancer Cloud (CCC), which will enable participating institutions to run remote queries on oncology clinical and research datasets held by other institutions that have agreed to share their information. Intel showcased the first iteration of the platform, which it developed in collaboration with researchers at Oregon Health & Science University's Knight Cancer Institute, at its annual developer forum held this week in San Francisco.

At the meeting, the partners provided the results of a successful proof-of-concept test that involved creating multiple versions of an OHSU dataset, hosting these data at a total of three sites, and then using the CCC run queries across sites. As a next step, the partners announced that they'll be running a second test in the first quarter of 2016 that will include two other yet-to-be-named cancer institutions that will further test the efficacy of the system and show that it can work in actual clinical use. After that test concludes, Intel will then release the infrastructure for broader use by the genomics community.

The planned system provides a way around the security, policy, and intellectual property concerns that have hampered data sharing, according to Eric Dishman, general manager of Intel's Health & Life Sciences Group. Currently, there are ongoing efforts that aim to combine public datasets for precision medicine research — and these have borne fruit — but there is a "wealth of the data ... sitting in the private data centers of hospitals and cancer centers and clinics," he said during a conversation with GenomeWeb following the announcement. "Until we can find a way to allow people to securely collaborate and make that available for analysis without giving up control of [the data], you'll never reach the numbers that you need to make the science and research work or clinical outcomes work."

Part of the hope here is that removing barriers to data access will help shorten the time to insights for patients. As a cancer survivor himself, Dishman has firsthand experience of how lengthy the road to personalized treatment can be. In his case, he told GenomeWeb, it took seven months to go from sequencing to analysis to treatment. About four of those months were spent shuttling disks of genomic information back and forth between treatment centers and aggregating clinical data from all the hospitals where he'd been treated over his 23-year cancer journey. "We need to figure out how to make this happen much more quickly," he said.

Brian Druker, a physician and director of Oregon Health & Science University's Knight Cancer Institute, said he sees patients go through similar experiences on a regular basis. A system that would enable oncologists to quickly compare their patients to larger cohorts and identify similar individuals who responded well to particular therapies would go long way towards shortening treatment times. Ultimately, the goal is to be able to come back to a patient within a day and offer them the best treatment regimen possible based on current knowledge about their tumors, he said.

To access information held in disparate databases, the system uses Intel-developed technology to wrap research queries in secure containers and then sends the contained queries to wherever the datasets of interest are located. The queries are run and the results are returned without revealing any patient data to the sender or revealing details of queries to the receiving institution.

The container technology is built on Intel's trusted execution technology, which is used for authenticating and ensuring the security of computing systems. A sample query that could be run using the system might target how often a particular mutation suspected of causing cancer has been observed in cancer patients across multiple sites, whether there are any treatments targeting it, and how have patients responded, Druker said. The platform also links to existing public repositories and pulls in information from these sources for comparison. In addition, Intel is making available open-source solutions that it has optimized to run on its Xeon processors within the cloud platform.

Intel plans to make components of the CCC open source, such as tools that address the challenges of holding multiple large files in memory at the same time for comparison, Dishman told GenomeWeb. This way, developers in the community can contribute to and improve the existing code, and institutions can use the open code to set up their own data-sharing communities and networks.

Although this initial iteration of the system focuses on cancer and includes some oncology-specific features — improvements that make tumor image analytics faster and more efficient, for instance — the underlying infrastructure is generalizable and can be adapted for genetically linked conditions such as Alzheimer's disease, diabetes, autism, and more. "We are just making sure that we prove it out in cancer first, but the open-source tools could be used by all sorts of [developers] to do lots of different things," Dishman said.

The first bits of code will be made available in the first quarter of 2016 with additional releases planned for later.

The CCC is optimized for and performs best and most cost effectively with Intel servers, but it can work with other kinds of infrastructure as well, according to Dishman. Depending on what sort of infrastructure institutions already have in place, they may need to make additional investments in terms of resources and capabilities to support precision health analytics. To help with that, Intel will also publish reference architectures — documents describing best practices documents — in Q1 2016 that will help IT departments at these institutions bring their systems up to date, he said.