CHICAGO – With the help of a short-term UK government grant and a long-term partnership with a research institute, Lifebit Biotech hopes to make federated sharing of genomic data more widely functional and accessible.
Lifebit recently announced a partnership with the UK National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre (BRC) to build a "trusted research environment" based on the London-based bioinformatics firm's Lifebit CloudOS technology.
Serena Nik-Zainal, genomic medicine lead for the NIHR Cambridge BRC, described the center as a virtual entity of clinical and research scientists who are involved in translational research.
The NIHR Cambridge BRC has named this new trusted research environment Cynapse. It will reside in an Amazon Web Services cloud environment owned by Cambridge University and run by Lifebit CloudOS, a federated genomics operating system that functions as a managed service of individual clouds or high-performance computing centers to support analysis of bioinformatics data across sites and institutions.
Both parties said that the partnership will break down data silos that have hampered biomedical research.
Thorben Seeger, chief business development officer of Lifebit, said that trusted research environment, or TRE, is a term conceived and defined by independent institute Health Data Research UK, and it is applicable beyond the life sciences. Such an environment is powered by software, but includes elements such as governance rules and secure access control.
Seeger said that Lifebit's primary business is to provide software to run trusted research environments in the UK and beyond.
Nik-Zainal, a clinical geneticist by training with a PhD in informatics, said that a trusted research environment can be as little as a password-protected entry point to a dataset, though it usually is more secure. The proliferation of such environments across the UK has created silos, even within the University of Cambridge medical campus.
"The problem is getting the TREs to talk to each other or getting datasets to be harmonized so that we can combine information to do research," Nik-Zainal said. "It's just not good for practice with data."
Lifebit already provides bioinformatics technology and a trusted research environment to Genomics England, which supports the National Health Service's Genomic Medicine Service.
NIHR Cambridge BRC chose Lifebit because the company also worked with Genomics England. "It felt like this was an opportunity to try to make that federation connection that is permissive for research to happen [more easily] between Cambridge and Genomics England," Nik-Zainal said.
Nik-Zainal also leads another recent development involving Lifebit, a consortium funded by Data and Analytics Research Environments UK (DARE UK). The DARE UK-backed project, consisting of NIHR Cambridge BRC, University of Cambridge, Genomics England, the Eastern Academic Health Science Network, Cambridge University Health Partners, and Lifebit, received a £200,000 ($271,050) award last month from a government funding entity called UK Research and Innovation to create a "bridge" between clinical data stores and Genomics England's genetic datasets, according to the Cambridge center. The consortium is intended to be an eight-month "sprint" to improve computational infrastructure in biomedicine across the country.
"At the end of the sprint, [we should] be able to demonstrate that data across Genomics England's core of 135,000 whole genomes and many tens of thousands of samples in Cambridge can be jointly queried" in order to build research cohorts, and then the data can be analyzed regardless of its location, Seeger said.
While the idea of federated analytics is not new, Nik-Zainal said that it is not as widely used in medicine because so many people have concerns about data privacy. She said that this project is among the first in the UK to link a public-sector initiative like the 100,000 Genomes Project with an academic research institute without having to move data around.
The project's steering committee will include a patient representative from the NHS Patient and Public Involvement and Engagement (PPIE) program. "We're dealing with data [with] sensitive issues and we need to hear their concerns and … worries," Nik-Zainal said.
She said that data privacy is a particularly sensitive issue in the Cambridge area because it was home to former consulting firm Cambridge Analytica that was at the center of the Facebook scandal involving misuse of personal data to target voters in the 2016 US presidential election.
The DARE UK-backed project is essentially a proof of concept with real data. Given Nik-Zainal's background in rare genetic diseases, the use case likely will be cancer-related, she said.
Seeger said that this kind of federated framework is particularly important in rare diseases, where it can be difficult to build suitable cohorts from limited datasets. "To get to statistically relevant amounts of patient data samples, collaboration is critical," he said.
The eight-month sprint under DARE UK will allow Lifebit to refine its technology and build open-source application programming interfaces that follow internationally accepted standards. Seeger said that Lifebit will be using Health Level Seven International's Fast Healthcare Interoperability Resources (FHIR) specification, as well as the Observational Medical Outcomes Partnership (OMOP) Common Data Model.
"There's no question, data standardization is absolutely critical," Seeger said.
Nik-Zainal said that Lifebit is also making sure that Cambridge follows Global Alliance for Genomics and Health (GA4GH) standards.
Because of the short timeframe, the project will focus on application programming interfaces and rules for how trusted research environments should communicate with each other. "Another element is to find novel approaches to governance, in particular, 'airlock' processes," meaning that data never leaves its highly secure host environment, Seeger said.
"It's a process where only results, maybe graphs, maybe summary statistics after an approval process can be exported," he explained.
In contrast, Cynapse is an open-ended collaboration, but because it is being rolled out simultaneously with the DARE UK project, Seeger expects the Cambridge trusted research environment to be live before the end of the second quarter.
NIHR Cambridge BRC will be implementing Cynapse in phases. Initially, the user base will be restricted to a small number of research groups at the University of Cambridge while the partners test the technology as they also set up a steering committee and a data access review committee, according to Nik-Zainal.
In the second phase, Nik-Zainal expects to open up the Cynapse platform to the entire Cambridge Biomedical Campus, though all individual researchers will be vetted before they can be granted access.
Eventually, Nik-Zainal wants to federate not only with Genomics England, but also with other research cohorts around the country, including UK Biobank, the NIHR BioResource, and the SAIL Databank. "That, of course, will be harder because they will have their own information governance requirements," she said.
While the DARE UK sprint will last just eight months, completing that program is essentially just the first goal for Cynapse, according to Nik-Zainal, who expects to uncover a number of issues and potential roadblocks during that short time period.
"I think we'll probably be able to do a first demonstration in the eight months, but presumably we'll need another good 12 months or so to put in place all the processes that we would like to put in to ensure that researchers access data on both sides safely and researchers can perform their research in a in a safe manner," she said.
Nik-Zainal said that the consortium partners are likely to seek additional funding beyond the eight-month duration of the current project. If the technology works, she explained, "you now have a blueprint to connect Genomics England with any of the other UK university sites and you have a blueprint for then connecting other public-sector projects or TREs with each other to enable scientists to do their work," she explained.
"Right now, it's slightly crazy that if I wanted a UK Biobank dataset, I would apply and then download an enormous amount of data and my colleague in the next office could be doing the same thing and we would have copies of the same datasets," Nik-Zainal said.
A well-functioning federated network could open up computational biology to those who might not have access to large IT teams, Nik-Zainal suggested. "I think we need to lower the bar for entry to data exploration," she said. "We should really be trying to enable more people to be able to do data science."
When the infrastructure is in place, researchers like Nik-Zainal will no longer have to download and store large cohorts from Genomics England and other outside sources, and then export data at the end of her research.
"It's kind of poor practice, really," Nik-Zainal said. "There are a lot of unnecessary processes. It could just be made so much easier, so much better as well" through a federated system.
Lifebit also hopes to take its work from the DARE UK project and from Cynapse global. The firm closed a $60 million Series B investment round in September and has been collaborating with the Jackson Laboratory since 2019. The latter gave the firm a foothold in the US.
Seeger said that Lifebit is using the recent funding to expand its US presence and is now seeking to enter the South American market. He said that several announcements on those fronts are imminent.
"A lot of national precision medicine projects and population genomic projects are following in that track of adopting federated architectures so that they can make their own data securely available for research in their own secured environments, as well as connect them to UK cohorts [and] others around the world," Seeger said.