NEW YORK (GenomeWeb) – Intertrust Technologies is bringing its expertise in data privacy, internet security, and content protection to the healthcare and life sciences arena with the launch of Genecloud, a cloud-based healthcare analytics platform based on Amazon web services that uses computerized policies to govern interactions between datasets and the software programs that operate on them.
Intertrust's methods control the way that third-party researchers access and interact with sensitive datasets, ensuring that the data-access principles that the data owners have put in place are upheld. This includes a full set of policy controls to manage individual and group access to data, as well as access control rules that let users determine what data attributes are off limits for a given party and limit access to data to maximize anonymity. There are also tools for tracking all actions performed in the system including what datasets have been accessed and what analysis programs have been run.
The underlying idea for the system is that "if we hand people data, we can't really control what they do with it but if we ingest their programs and their analysis tools, we are actually able to do that kind of governance. Essentially that's what our platform does," Knox Carey, vice president of technology initiatives at Intertrust and Genecloud's general manager, explained to GenomeWeb. Bioinformatics pipelines operate on data as they would on other platforms within Genecloud, the only difference being that it has an electronic "policy layer" that determines which datasets are available for computation and which are not, he said.
Last week, the company announced that its first customer for Genecloud is Cambridge, Massachusetts-based immune-technology startup Haystack Bio. The company, which was incorporated early this year but has yet to launch officially, is commercializing single-cell genomics technology that was developed by research groups at the Broad Institute and the Massachusetts Institute of Technology.
Haystack Bio targets biopharmaceutical companies and research institutions that are developing new immunotherapies in areas such as oncology but also vaccine development and neo-adjuvant therapies. It is currently negotiating agreements with two unnamed customers from industry and academia, Haystack Bio CEO Jim Flanigon told GenomeWeb.
Initial applications for Haystack's technology include identifying signatures for companion diagnostics as well as performing quality control for single-cell genomics projects. Further down the road, it could use its tools help companies identify new therapeutic targets for their drugs "What we offer ... is the ability to help them take a really precise look at what's going on at the single-cell systems level of a patient sample and put that information together across multiple samples," Flanigon said.
Specifically, Haystack is using Genecloud to host a knowledgebase of information on immune cells and analytics tools that underlie that company's solutions for the immunotherapy space. The company also offers library preparation services for single-cell RNA sequencing as well as tools for identifying cells of interest in sparse samples. "For an early-stage startup like ourselves ... and the types of partners we are trying to engage with, having a partner like Genecloud is very helpful because it adds a level of credibility and peace of mind that, I think, is hard to get from your typical startup," he said.
Currently, the bulk of the information contained in the knowledgebase is from RNA sequencing, Flanigon said. The platform also includes "a variety of pre-processing, normalization, and analytics tools that we use on that data." The knowledgebase also contains information on cell protein surface markers as well as functional information related to things like cytolytic activity or immune cells' interactions with drugs. The company provides this information from its own experiments but also accepts information from partner companies and institutions. "As long as we know how the experiment was set up, we have written normalization techniques and processing techniques to take that into account," he said.
Genecloud is one of several internal projects that Intertrust has undertaken in recent years that uses its data security methods and technologies to handle sensitive data in a number of industries, according to Intertrust's Carey. Intertrust markets computing products and services for data governance and trust management to enterprise software developers, mobile manufacturers, and other companies. Its products, which include tools for digital rights management and software tamper resistance, are largely used by clients in the media and entertainment industry.
Intertrust's interest in genomics started about three years ago. According to Carey, the company was especially interested in working with genomic data because of its sensitive nature and the risks that are associated with exposure. Third parties can, for example, infer phenotypic information about an individual from their genomic datathat they may not want made public.
"We started to ask ourselves 'how do some of the inventions that we had created in previous years apply in processing big data? What types of data governance problems appear when you are dealing with other really sensitive information besides media data?'" he said. "We think that there's ... a balance in there where you can allow people to have access to the data who need to do research on it ... and at the same time have that data governed in a way that gives the people whose data it is some assurance that it's not being misused. That fits right in with our company's background."
Another reason for choosing to work with genomic data is that there's much less of a software legacy problem in genomics compared to some other arms of the healthcare space, Carey noted. For example, there are so many different types of electronic health record systems in use today that "making them interoperable and addressing how you would govern this information in a uniform way seemed like a really daunting task," he said. In contrast, "genomics is relatively new [and] the idea of actually governing genomic data is relatively new [so] there was just more opportunity to innovate there."
To learn how best its methods could apply to genomics, Intertrust joined the Global Alliance for Genomics and Health (GA4GH), which is putting in place infrastructure, standards, and formats that will make genomic and clinical data more usable, shareable, and secure. Carey is now a member of the GA4GH's security working group, which is responsible for developing or adopting standards for data security, privacy protection, and access control. "That's helped us get a lot of insight into what challenges people are actually facing in the field," he said. "We didn't necessarily want to speculate as to what the challenges were and solve those, we actually wanted to see what people were dealing with."
Participation in the GA4GH has also helped familiarize the company with the complexities of the biomedical research space. "We found pretty early on that the map of all the different stakeholders in the data is much more complex for healthcare" than it is in the media industry, Carey said. "Even in a very standard clinical care transaction, you have a patient, a doctor, an insurance company, you have public health officials, there's all sorts of people involved. So the value chains are much more complicated."
Also, genomic data is far more personal to people and when leaks occur the stakes are much higher, he noted. "While a media company may want to know that their movie is very well protected, that's a far cry from" 'This is my personal information and if you had access to it, you could assess my risks of developing Parkinson's disease,' or something like that."
Intertrust is also developing bioinformatics pipelines that it will include in Genecloud starting with ones for RNA and DNA sequencing, Carey said. Genecloud customers can also create and upload their own pipelines in Docker containers to the platform if they choose to. They can also pull in third-party information from external repositories using Genecloud's application programming interface. Genecloud is also working with GA4GH to implement Beacons, which allow users to share very basic information about their datasets in response to third-party queries.
Intertrust is also exploring ways of letting users incorporate clinical information into their analysis in Genecloud. "Our strategy for incorporating clinical data is to rely on standards like [the] Fast Healthcare Interoperability Resources," Carey told GenomeWeb. "Programs needing access to clinical data can then make web services calls to a well-established API like FHIR. At the point of the calls, we can apply different types of policy management [such as] should the person running the analytics have access to the specific data they are requesting?"
They are also looking at alternative methods of deploying Genecloud, including providing a version of the solution that can be deployed locally. "We think it's important because for some customers, we anticipate that certain types of data won't leave the premises," Carey said. "So we are looking at ways to ensure that we can loop that data into an analysis as well as [data] that gets uploaded into [the cloud]."
Intertrust has other customers who are interested in using Genecloud but it is not disclosing who they are at this time.