NEW YORK — The European Genomic Data Infrastructure (GDI) project recently released a series of new tools that European data centers can use to prepare themselves for providing access to whole-genome data as part of the 1+ Million Genomes initiative.
The GDI starter kit is the first outcome of the €40 million ($43.5 million) project, which commenced last year. The mission of GDI, which involves 54 partners from 20 countries, is to create the infrastructure to realize the aims of 1+MG, which was announced in 2018, with the initial goal of making at least a million human genomes accessible across Europe by 2022.
While that objective was not realized, in part because participating countries shifted their focus during the COVID-19 pandemic, the number of signatories to the 1+MG declaration has grown to 25 EU countries, plus the UK and Norway. The EU first committed €4 million to a project called Beyond 1 Million Genomes (B1MG) in 2020 to support the development of guidelines and strategies for realizing the goals of the initiative. The purpose of GDI, however, is to establish infrastructure for access to genomic data.
Serena Scollen, head of human genomics and translational data at ELIXIR, the Hinxton, UK-based European Life Sciences Infrastructure for Biological Information organization, acknowledged that to make more than a million genomes accessible across Europe was "always going to be an ambitious initiative," adding that "great progress" had been made by both the B1MG and GDI projects, as well as at the national level.
"The two implementation projects supporting the 1+MG initiative have advanced the work from a design and testing phase to the scale-up and sustainability phase," Scollen said of B1MG and GDI.
B1MG, which will run until September, intends to make all guidelines it produced available within one framework. Most are already available online.
Another project, starting next year, will generate about 80,000 genome sequences representative of the European population and will be used as a reference dataset for researchers, Scollen said, adding that a roadmap for that project will be available soon.
Starter kit
The new GDI starter kit will support federated data access workflows. While it is intended for the national GDI nodes — the data hubs that will make genomes available between the signatory countries — the developers believe that other public institutions as well as companies could make use of the kit in the future.
The kit is a collection of software applications and components that allows countries to access, for now, synthetic genomic and phenotypic data across borders, according to Scollen. It was codeveloped by the 20 GDI nodes and is based on standards from the Global Alliance for Genomics and Health (GA4GH), a decade-old nonprofit that develops standards around genomic data.
The tools support the "five functionalities required by the 1+MG data infrastructure," namely data discoverability, data access management, storage and interfaces, data reception, and data processing, she said.
Included in the kit are more than 2,500 synthetic genomes and accompanying phenotyping data with relevance to cancer, rare disease, and population genomics research. Using that data, GDI nodes should be able to run pilot projects, with the goal of sharing real data across borders in the future.
Most of the tools in the starter kit are open source and can be operated as services within national hubs. Among them are the Life Science Authentication and Authorization Infrastructure (AAI) and the Beacon Network, which serves as a data discovery platform. They were developed in collaboration with other projects, such as the Federated European Genome-Phenome Archive (EGA), in part to ensure interoperability between the starter kit and other genomic data-sharing platforms.
The kit components will be further developed, Scollen said, for example, to enable better data protection and to support rare disease, cancer, common complex disease, infectious disease, and pharmacogenomics research.
Industry, with the relevant permissions, is also considered a potential partner in the data infrastructure. Scollen cited the example of DNAStack, a Toronto-based company that provides a cloud-based platform for genomic data sharing and has invested in developing Beacon to facilitate data discovery.
"We would like to see companies in Europe look towards developing tools for federated data analysis that could run within the federated infrastructure," she said, "or perhaps being able to support the movement towards genomics in a healthcare setting." Industry could also develop new products that streamline the process from data generation to acquisition, analysis, and enrichment, she added.
While 1+MG has not achieved its goal of making a million European genomes shareable by 2022, data for nearly a million human genomes exists in Europe today. However, legal hurdles still prevent, say, an Irish researcher from accessing Finnish genomes, and vice versa.
Scollen said the collaborators are seeing progress at both the country level as well as through coordinated European approaches that will reduce these obstacles. To support this, B1MG last year produced a tool that can help countries determine where to invest in genomic data resources.
Also, the European Health Data Space proposal, currently being considered by the European Commission, envisions a health-focused ecosystem of rules, common standards and practices, infrastructure, and a governance framework. The EHDS proposal will require that health data collections, including genomic data, be discoverable and accessible, Scollen noted, and will help GDI to "develop a trustworthy setting for secure access to data."