Skip to main content
Premium Trial:

Request an Annual Quote

European Countries Step up Efforts to Share Genomic Data as Part of 1+ Million Genomes Initiative

Premium
Population Genomics

NEW YORK – 2024 is set to be a transformative year for genomics in Europe as national and regional efforts to invest in infrastructure and data collection are gathering momentum.

While the European 1+ Million Genomes initiative might appear to be at the vanguard of genomic data sharing, national projects such as the 2025 France Genomic Medicine Initiative and the German Human Genome-Phenome Archive and GenomeDE are also advancing.

Meanwhile, a new project called Genome of Europe, to be funded by the European Commission, has the potential to generate 500,000 genomes from 40 European populations.

In the past, genetics research was based on large, cohort-based studies, according to Ewan Birney, director of the European Bioinformatics Institute and chair emeritus of the Global Alliance for Health and Genomics. But declines in the cost of whole-genome sequencing, coupled with the adoption of genomic medicine over the past decade or so, have created the opportunity to collect and share more data, whether through national research projects such as Finland's FinnGen, the Estonian Biobank, or the UK Biobank, or national clinical genomics services, such as those provided by Genomics England, Genome Medicine Sweden, or the Norwegian Genomic Service. All of this data can now be collected, curated, and made shareable to researchers in Europe, he said, leading to future discoveries.

Making data available for research across borders is not as simple as uploading it all to the cloud, though. Data is often maintained not just at the national but at the provincial or regional level of healthcare systems, and even for established cohorts, it can be difficult to gain access and export findings.

"There are different legal situations in different countries," remarked Ivo Gut, director of the Centro Nacional de Análisis Genómico (CNAG) in Barcelona, one of the largest genome sequencing centers in Europe. "There is no way that the French, for example, will in a million years take all their data and stick it in a cloud that they do not control," he said.

As for accessing Genomics England's dataset, which is available to approved researchers, "it always sounds good on paper, but trying to access it is like going to Fort Knox, being let in the door, and allowed to do your research," said Gut. "You can compute on their system, but you can't export the graphs or download the plots." Researchers get by using screenshots, he said.

Gut is also involved in Spain's Precision Medicine Infrastructure Associated with Science and Technology (IMPACT) project, which aims to provide the country's national health system with infrastructure to support the implementation of genomic medicine. The project is also supporting data integration across Spain's 17 autonomous regions.

Spain currently has three sequencing nodes in Barcelona, Navarre, and Galicia, he said, and a fourth will likely open soon. Similar activities are underway across Europe in various forms, some of which will feed into the 1+ Million Genomes initiative.

Since the declaration of the initiative in 2018, 25 EU countries as well as the UK and Norway have signed on. The initiative has diverse goals, but creating the technical infrastructure to ensure federated access to data tops the list, while putting the ethical and legal frameworks in place to support data sharing for years to come. But all of this will take time, said Birney, likely continuing into the next decade.

Data engineering issues, meanwhile, can be just as daunting. "It's easy to put in a PowerPoint that you want to pseudo-anonymize these genomes and make them available for research," he said. "But it will take a team of 10 data engineers to actually make them available."

Juan Arenas leads deployment for the GDI project, a €40 million ($43 million) undertaking, launched in 2022, to create the infrastructure to realize the aims of the 1+ Million Genomes initiative. He also manages projects for ELIXIR, an intergovernmental organization based in the UK that coordinates and develops European life science resources across 23 national nodes and the European Molecular Biology Laboratory.

According to Arenas, GDI this year aims to demonstrate data access across five or more countries that have implemented the requisite workflows for data governance. The demo will use synthetic genomic and phenotypic data for 2,500 fictitious individuals and cover disease areas including rare diseases, cancer, and infectious diseases.

Researchers will be able to identify data available across countries that match their study requirements, apply with a single form, and get access to the data in a secure processing environment.

All countries that are part of the GDI and have their node set up in at least a preproduction environment can join the demo, Arenas said. At the moment, they are expected to include Belgium, Finland, Luxembourg, Spain, Sweden, the Netherlands, and Norway. "Other countries that have been making significant progress on their technical deployment could be in a position to join, too," he added.

Meanwhile, the European Health Data Space may complement genomics efforts with clinical data sharing. Proposed by the European Commission in 2022, EHDS will support national and EU-wide access to electronic health records for research, policymaking, and regulatory activities. The Commission expects EHDS to launch in 2025.

According to Arenas, the 1+ Million Genomes initiative will allow researchers to run queries across national cohorts, find the data necessary for their studies, and apply for access via a single form. An EU-level data access committee could grant approval within a month, allowing researchers to run their own analyses. "1+MG will be part of EHDS as a data-sharing infrastructure, one of the EHDS node types anticipated in the legislative proposal," he said.

Birney described EHDS as a "big move" that is modeled on how healthcare data has been harmonized and made accessible across the Nordic countries. He said that the EHDS legislative proposal has created an opportunity for more conversations around data sharing efforts in Europe. "1+ Million Genomes and EHDS touch," he said. "There is quite a lot of conversation between those two efforts."

Another project that touches European genomics is the European Open Science Cloud, an environment for hosting and processing European research data. Last year saw a new EOSC-related project called EOSC4Cancer commence, which aims to make different types of cancer data accessible for research, including genomics, imaging, and medical data.

Arenas said GDI is collaborating with EOSC4Cancer, as well as another European project called the European Cancer Imaging Initiative (EUCAIM) that aims to create a federated European infrastructure for cancer imaging data. Together, the projects are working to build technical services to enable AI-based analysis of cancer images, genotyping data, and health data, Arenas said.

Franco-German motor

While the UK and Spain have been at the forefront of genomic data sharing among larger European countries, France and Germany have been motoring along with their own efforts. Combined, the two countries have a population of 150 million people, making data integration more cumbersome than in smaller countries like Estonia, Finland, and Iceland that have piloted population-scale genomics projects.

Germany and France did not initially sign the 1+ Million Genomes declaration in 2018 but joined the initiative in 2020 and 2022, respectively.

In Germany, scientists are currently setting up the German Human Genome-Phenome Archive (GHGA), a national omics data infrastructure that will provide a framework for the "use of human genome data for research, while preventing data misuse," according to Ulrike Träger, communications officer for GHGA at the German Cancer Research Center (DKFZ) in Heidelberg.

GHGA involves 21 institutions across Germany and is funded through the German National Research Data Infrastructure project, Träger said. While it is devoted to sharing genomic data for secondary research, a separate initiative called GenomeDE, launched in 2019, aims to bring genome sequencing into the standard of care this year through a national pilot project. Data consented for research use will be stored within the GHGA.

Germany consists of 16 federated states, each of which enjoys a great deal of autonomy, and data sharing within Germany requires adhering to the data protection laws of each state, as well as the EU's General Data Protection Regulation.

Another issue German researchers must deal with is mindset, as Germans are known to be very protective of their personal data. Underscoring this, a 2020 article in Nature reported that Germans taking part in the study Your DNA, Your Say were less willing than their English counterparts to make their data available for research. On the other hand, a recently published survey of citizens in Germany and Israel, conducted by the German Israeli Health Forum for Artificial Intelligence, found that 82 percent of Germans are willing to contribute their anonymized patient data for medical research.

Late last year, the German parliament (Bundestag) also passed a new law to improve the use of health data for research and development, including genomic data from its national pilot project. According to that law, data from the pilot will be held in a central platform, to be established by the Federal Institute for Drugs and Medical Devices (BfArM).

Träger acknowledged that while Germany has a "considerable track record" in genomics, it does face challenges when it comes to genomic data sharing. GHGA is participating in establishing a German node for the GDI project to make data available to the 1+ Million Genomes initiative in the future, she said. It is also preparing for its first data release later this year, GHGA 0.9, which has undergone extensive testing as well as external stress tests.

The French Genomic Medicine Initiative 2025, meanwhile, is also continuing to work toward its national aims while interacting with the 1+ Million Genomes initiative, according to Frédérique Nowak, the project's operational coordinator. While French healthcare is "very centralized," its hospitals are autonomous, and an effort is underway to harmonize the informatics systems used to store electronic health records, she said.

The initiative, launched at the request of the French government in 2016, has seen the deployment of two national sequencing centers, one in Paris called SeqOIA and another in Lyon called Auragen. Both support the implementation of whole-genome sequencing in French healthcare, for rare diseases, cancers, and some common conditions. According to Nowak, the two sequencing labs can currently sequence up to 6,000 patients each year.

There are also four pilot projects underway as part of the initiative, focused on rare diseases, cancer, diabetes, and genetic variation in the French population. All data will eventually be accessible through the nodes established by the GDI to support 1+ Million Genomes, she said.

According to Nowak, it is unclear whether France's initiative will be extended after 2025 or whether a new initiative might be announced.

Genome of Europe

Another new project, called Genome of Europe, is underway following a call for applications by the European Commission last year. It aims to create a European reference genome database of genetic variation based on whole-genome sequencing data for at least 500,000 Europeans representing the population of Europe.

The project should support disease research, and the reference data could be used for data imputation and enrichment of genotype information, the Commission said in its call. The half-million genomes will contribute to the million genomes aimed for in the 1+ Million Genomes initiative.

According to Arenas, the new project will make use of the nodes established by the GDI. The first arm of Genome of Europe will acquire whole-genome sequencing and whole-exome sequencing data from about 100,000 healthy participants. He said the Commission would likely provide €20 million for Genome of Europe for the first part of the project, with participating countries expected to contribute an additional €25 million.

With "robust background" on Europe's population groups, scientists could devise population-specific polygenic risk scores, said CNAG's Gut. "You could then test an entire population with an array and provide back risk scores in relation to genomic background and profile," he said. Funding for the project has not yet been formally awarded, but an announcement is expected soon.