NEW YORK – A new European consortium has placed some of the region's largest biobanks and most innovative informaticians and clinicians under a single banner. Called Intervene, the €10.4 million ($12.6 million) project will use genomic and health data from biorepositories along with machine learning methods to develop new clinical predictive tools.
The effort is backed by EU Horizon 2020 funding and is set to end in December 2025. It involves 18 partners, including one in the US, and will oversee multiple pilot studies, as well as the development of an AI-enabled federated data analysis platform. According to the organizers, the project could catapult Europe to a leading position for translating genetic findings into the clinic.
"I think the downstream effects will be that we will show how to link genetic risk data to the clinics," said Samuli Ripatti, a professor of biometry at the University of Helsinki and the principal investigator on the project.
"If we can show how that is done, and how it works as a model, that will have a huge impact downstream," he said.
The official name of Intervene is the International Consortium for Integrative Genomics Prediction. In addition to the University of Helsinki, it includes IBM Research, the European Molecular Biology Laboratory, the University of Siena and the University of Turin in Italy, the University of Tartu in Estonia, the Biobanks and Biomolecular Resources Research Infrastructure Consortium (BBMRI) in Austria, Technical University of Munich in Germany; the European Cancer Patient Coalition in Brussels, the University of Cambridge and Queen Mary University of London in the UK, the Norwegian University of Science and Technology, Massachusetts General Hospital in the US, and other partners.
The partners will link data from the FinnGen Study, the Estonian Biobank, the Network for Italian Genomes, Genomics England, the UK Biobank, the HUS Helsinki Biobank, the Trøndelag Health Study in Norway, and the Partners Biobank, pooling genomic and electronic health record data on upwards of 1.7 million individuals from across Europe, with the addition of a US partner to diversify the project.
"Our US partner has a diverse biobank, meaning there is a richness in where people originate," said Ripatti. "If we really want to push these tools into the clinics, we obviously need to be able to utilize these helpful tools for various ancestries, not only Europeans, but also in Europe."
Another diverse biobank is the East London Genomes and Health Biobank at Queen Mary University of London. "Their biobank is heavy on South Asian ancestry, so they also contribute to our opportunities for studying individuals with non-European ancestry," he noted.
Investigators aim to link the data in a secure repository and use it to develop a new generation of integrative genetic scores related to several diseases, as well as SARS-CoV-2 infection and severity. As part of the process, they will also liaise with patient advocacy groups and medical societies to develop an ethical and legal framework for making the tools widely available. They also plan to create a data analysis platform called IGS4EU that will allow users to upload and analyze biobank data. In the future, the consortium believes this tool could become widely used in clinical research.
"The idea is to bring together the machine learning [and] AI community, as well as those engaged in methods development, the large biobanks, and then the clinicians, in order to develop new tools," said Ripatti. While this has been an area of intense focus for years, resulting in "many strong groups" around Europe involved in developing genetic risk predictors, he said they have been held up by a lack of access to good datasets. The goal of Intervene is to at last bridge this gap.
Much of the data accessed for Intervene is genotyping array data, such as from the FinnGen study, generated on a custom Thermo Fisher Scientific array platform, or from the Estonian Biobank, which was largely generated using a version of Illumina's Global Screening Array. But sequencing is also involved, as the UK Biobank, for instance, continued to churn out whole-exome data on its 500,000-person repository, which has also been genotyped using arrays.
"The biobanks are still relying mostly on array genotypes," noted Ripatti. "That will change over the next couple of years, with the UK Biobank leading the way with exome sequencing." The UK Biobank in October announced that it has sequenced the exomes of 200,000 participants to date.
IGS4EU is a core output for Intervene. The envisioned interface will allow users to analyze their own data using a variety of tools developed by the consortium. It is unclear at the moment who will host the platform, though discussions are underway.
"We want to create a service where people with new algorithms can try them out, so that the biobank data is in the background," said Ripatti. "The data will never leave the biobanks, but the algorithms can be run on the data."
Pilot studies
As part of Intervene, participants will also evaluate their integrated genetic scores for several diseases in three cohorts, two 2,000-patient cohorts in Finland and Italy and a 1,000-patient cohort in Estonia. The Finnish and Italian pilot will evaluate the use of the new risk predictors and genetic counseling for breast cancer, while the Estonian pilot will focus on type 2 diabetes and coronary heart disease.
"These studies will show the additional value of using genetic information on top of our routine health checkup information," said Ripatti.
Reedik Mägi, a professor of bioinformatics at the Estonian Genome Center at the University of Tartu, said that Estonian Biobank participants will be asked to participate in the pilot where the benefit and feasibility of using genetic risk prediction and genetic counseling will be measured.
He underscored the importance of integrating the data from the biobanks before the pilot studies commence. "It is crucial that the health information in participating biobanks is in a comparable format and machine readable," said Mägi. As such, the Estonian investigators will also work to harmonize health and genetic data in all the participating biobanks. Only this way will Intervene be able to realize its aim, to "create standard procedures for evaluating new disease risk prediction algorithms in biobanks across Europe, making it possible to use them as a standard practice in healthcare," he said.
While daunting, Ripatti believes this aim is achievable. "Five years ago, we didn't know how to utilize the polygenic risk scores well, or how to squeeze out the information from the genome," he said, but "now we have got the building blocks" to show their clinical benefit.