NEW YORK (GenomeWeb) – The Global Alliance for Genomics and Health (GA4GH) this week unveiled its strategic plan for the next five years. Called GA4GH Connect, it calls on the alliance's 500-plus members to develop new data sharing standards for use in major international genomic data initiatives.
GA4GH has already agreed to work with 13 initiatives, including Genomics England and Australian Genomics, which it calls driver projects, to develop and release new standards for genomic data discovery, analysis, and interpretation.
The alliance is discussing its plan, set to run through 2022, at the American Society of Human Genetics meeting in Orlando, Florida. The event is being livestreamed today.
In addition introducing GA4GH Connect, leading members of the alliance have published an editorial in BioRxiv that outlines the expectations and challenges associated with health-driven genomics. The authors maintain that it is the shift away from research-funded genomics toward healthcare-funded genomics that has led GA4GH to develop its strategic plan, as well as to restructure itself into a "delivery-focused organization" supplying driver projects with new tools.
"The key thing to internalize is that practicing medicine will fund many more genomes than practicing research," said Ewan Birney, director of the European Molecular Biology Laboratory's European Bioinformatics Institute in Cambridge, UK, and chair of the GA4GH steering committee.
"The increasing utility of genomics for health means that … most of the population will wind up with their genome sequenced at some point in their lives," Birney said.
As outlined in its editorial, co-authored by Birney, the GA4GH currently estimates that up to 50 million genomes could be sequenced by 2022. The availability of those new datasets will pose new challenges compared to the way data was stored and shared in the past. Whereas genomes sequenced for research purposes were often shared internationally from a single location and downloaded directly by users for analyses, data generated within national healthcare systems will have to be distributed differently, according to Birney.
"With these large healthcare cohorts, we have to invert the paradigm and have a virtualized analysis that gets sent to secure cloud locations around the world," he said. "To do that, we need the standards that the GA4GH is starting up."
The GA4GH rolled out the first of these standards, called htsget, to coincide with the publication of its editorial, as well as its plenary meeting. The htsget standard is a genomic data retrieval specification that enables users to download read data for the parts of the genome that most interest them. Previously, users would typically download whole data sets and then search for those regions of interest, a more costly and time-consuming endeavor.
It is these kinds of tools that GA4GH aims to fashion by engaging with driver projects.
"These real-world projects include some of the biggest clinical genomics projects around the world," said Birney. "Because we have this level of technical engagement from these groups, we want to make sure that we have standards that are fit for purpose across a very big group of organizations."
Birney noted that GA4GH is specifically trying to design interoperable tools that will allow a "federated analysis" of data between organizations and countries. These tools should enable users to not only better analyze and store datasets, but to integrate them into the clinical management of patients with rare and complex diseases, including cancer.
The 13 initial driver projects participating in the effort are the US National Institutes of Health All of US Research Program; Australian Genomics; the BRCA Challenge; CanDig, the Canadian Distributed Infrastructure for Genomics; the US Clinical Genome Resource; ELIXIR Beacon; the European Nucleotide Archive, European Variation Archive, and European Genome-phenome Archive, all hosted at the EBI.
Also taking part are Genomics England; the International Cancer Genome Consortium for Accelerating Research in Genomic Oncology; the Matchmaker Exchange; the National Cancer Institute Genomic Data Commons; the Monarch Initiative; and the Variant Interpretation for Cancer Consortium.
"The driver projects are a mixed bunch and that is deliberate," said Birney. "That heterogeneity is about clinical practice, such as Genomics England, while some are large cohorts, such as the EGA, ENA, and EDA," he said. "It's a very healthy list."
Birney added that GA4GH has reached out to all major genomics initiatives globally, and that some will be coming onboard as additional driver projects next year. The alliance in 2018 will release a set of criteria whereby other initiatives can join.
To better serve the needs of these driver projects, GA4GH has also restructured internally. Rather than relying on working groups and task teams, as it had since it commenced its activities in 2014, GA4GH will now produce new standards and tools using what it calls technical work streams and foundational work streams. Technical work streams will consist of teams of field leaders engaged in designing and implementing new standards and tools. Foundational work streams will advise on legal, ethical, and data security issues.
Specific technical work streams outlined in the GA4GH Connect five-year plan include groups focused on crafting methods for clinical and phenotypic data capture and exchange, standardizing cloud environments, streamlining researcher identification, designing a unified data discovery platform, and developing standards for accessing and analyzing large-scale genomic data.
Specific foundational work streams mapped out in the GA4GH Connect plan will provide guidance in the areas of legal regulation, ethics, and data security in genomics — both within GA4GH and more broadly. In particular, its regulatory and ethics arm will attempt to harmonize consent and privacy policies and data governance models.
Both kinds of work streams will be overseen by work stream leads, who will manage the teams to deliver the necessary standards to the driver projects.
"This is a much more mature phase for GA4GH," said Birney. "We have refocused this year on making sure we build standards that are useful in the real world, and the way you do that is by working with the projects that need standards in different areas," he said. "Previously there was a standards group that was separated by the real world implementers," he added. "Unsurprisingly that doesn't work so well."
To fund the activities of GA4GH, the alliance will rely on what Birney termed a "harlequin funding model," where it will "patchwork funding across organizations in different contexts." The three main institutes supporting GA4GH are the Broad Institute, the Wellcome Trust Sanger Institute, and the Ontario Institute for Cancer Research.
"They have provided the resources to get this off the ground and to provide institutional resources to process other patchwork funding," said Birney, noting that the NIH and Genome Canada are also supporting the alliance. Other GA4GH members can apply for funding from various sources while aligning their projects with the goals of the GA4GH.
Should these efforts be successful, Birney hoped that by 2022 there would be widespread uptake of standards for sharing clinical-grade genomic data.
"I want to see our standards ubiquitously used in genomics," said Birney. "Just the same way we use the web and nobody worries about the protocols, that's the goal for researchers and clinicians," he said. "That things will just work for them."