NEW YORK (GenomeWeb) – As institutions and governments continue to launch large-scale initiatives that aim, ultimately, to use genomic data in clinical contexts, the Global Alliance for Genomics and Health is responding by prioritizing tasks and setting specific goals aimed at addressing potential issues that could limit data use for this purpose.
The organization released a strategic roadmap last week that describes in some detail its priorities and goals for 2018. The document, which prioritizes tasks around data sharing, systems interoperability, as well as regulatory and privacy concerns, describes deliverables across eight workstreams focused on clinical and phenotype data capture, cloud, data use and researcher identities, data security, discovery, genomic knowledge standards, large-scale genomics, and regulatory and ethics. It is one of the early fruits of GA4GH Connect, a five-year initiative that the organization launched late last year to better align its objectives with the key needs of the international genomic data community.
The GA4GH grew out of a need within the genomics community for collaboration, interoperability, and ways of sharing data responsibly in the research context. Addressing that need remains at the core of its activities, according to GA4GH CEO Peter Goodhand. Using genomic data in clinical contexts poses similar challenges with the added complication of increased requirements for data security and privacy, questions of informed consent and data use, as well as restrictions imposed on the movement of data across geographic locations.
"We took a long hard look at how we were organized and, more importantly, what were focused on and how coherent those things were," Goodhand. They also contemplated how much genomic data may be available in the near future as well as the sources of those datasets. "Significantly more of them will come from healthcare not traditional research and that changes what you do," he said. All that "caused us to take a long hard look at the global alliance … and organize in a more deliberate and more coherent way."
In drafting the 2018 roadmap, the organizing committee asked for input from participants in 15 current GA4GH driver projects, whose involvement was "critical" in setting the priorities for the year, according to Ewan Birney, who was hired as GA4GH chair in 2016. Their suggestions allow the organization be more "deliberate and directed" in its efforts to address large-scale genomic data sharing, he said.
The driver projects that were part of the early phase of the GA4GH "were great demonstrations of the importance and value of data sharing but they weren't particularly good demonstrations of most of the tools that the Global Alliance was developing because they were done concurrently," Goodhand explained. Under the new plan, the organization has categorized the different driver projects based on eight workstreams that cover technical aspects of genomic data sharing including clinical and phenotypic data capture, regulatory and ethical concerns associated with data sharing, and data security.
"It's bringing a real structure and discipline to what we do," Goodhand said. This is crucial for enabling interoperability across systems and institutions and key to establishing well documented and tested data-sharing standards that everyone agrees on. "We're asking every one of those workstreams to dig a bit deeper, put more flesh on the bones, and start to put some stakes in the ground and say 'this is what we will do in the next year or two.'"
But at the same time, the roadmap is "not carved in stone," he noted. GA4GH members can suggest changes and updates that, if approved, can be added to the roadmap. Furthermore, on the regulatory side, "we are still very much focused on frameworks and policy development and that has to dovetail with the technical standards," he added.
Contributions to the roadmap came in from members of more established initiatives such as the GA4GH's Beacon project, which lets participating sites share genomic data securely via a network of services installed locally at different institutions. Project members also develop an application programming interface that provides specifications for querying genomic, phenotypic, and clinical data. Early last year, the GA4GH partnered with the European Life-Sciences Infrastructure for Biological Information (ELIXIR) initiative to establish beacons at multiple sites that will make it easier to search European genomic datasets as well as develop protocols for securely sharing phenotype data.
Other key driver projects include the Matchmaker Exchange, which was developed by scientists from Harvard Medical School and elsewhere to connect disparate databases of genomic and phenotypic information, making it easier for users to search for and pull information on genes and phenotypes of interest as well as connect with colleagues who are studying similar projects.
Finally, BRCA Challenge, was launched in 2016 to provide access to data on BRCA 1/2 variants from public repositories around the world through BRCA Exchange, which, according to its developers, is the largest source of public information for BRCA 1 and 2 variants. The most recent statistics gleaned from its site show that portal now contains over 19,000 unique variants. Nearly 8,000 of these are BRCA 1 while some 11,000 are unique BRCA 2 variants. Of the 19,000, about 6,100 have been classified by experts involved in the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium. About 3,700 are expert classified as pathogenic, 1,200 are classified as benign, and 1,100 are classified as likely benign.
Other driver projects included in the GA4GH's roster include large-scale initiatives like All of Us, the National Cancer Institute's Genomic Data Commons, Genomics England, the Monarch Initiative, and ClinGen.
On the regulatory side, plans for the year include providing ethical, legal, and policy guidance to research policy makers and projects to help guide decision on what information to share with research participants about genomic findings relevant to their health. The roadmap also calls for harmonized policies and requirements governing access to cloud-based genomic and health-related data in both research and clinical contexts. Other efforts focus on understanding and interpreting global data privacy laws and helping to develop a code of conduct for health-related data, as well as developing best practices for detecting, accessing, and responding to data breaches involving GA4GH standards.
There is precedent for cross-country collaborations at least in the research context from projects such the International Cancer Genomes Consortium and the HapMap projects. In the future, as more healthcare-related data is generated, concerns about patient privacy and data security, differences between systems used by hospitals and clinics, and adhering to state and national laws become increasingly pertinent. Furthermore, the sheer size of the data that could be available by 2022, likely millions of genomes, could make moving it impractical.
Offering federated access to the data is a viable alternative that circumvents these issues and is the data-sharing model that the GA4GH has bought into, according to Birney. "We think the medical data will remain in country-level points of aggregation and then researchers might have access to that data through [the use of] virtual machines, [for example]," he said in an interview.
Goodhand expressed similar sentiments. "If it is not legally allowed or it is not practical to move the data from one place to another, leave the data where it is, secure it, set the rules that determine access and then send the questions to the data," he said. Critical to making this interaction possible is ensuring that the data in question can be located and that the queries sent from one system work in exactly the same way when they interact with the receiving infrastructure. "And that's where the interoperable standards come in." The partnership between Beacon and ELIXIR is one such effort to address this issue.
Developments on the technical side will include a system of establishing user identities and credentials that will govern access to private data, and developing ontologies and terminologies that provide standards for capturing and exchanging information between systems. Efforts on the technical front will also focus on expanding existing standards for representing genomic and variant data, namely the SAM, BAM, CRAM, and VCF formats.
The 2018 roadmap is the first of several documents that the alliance plans to release over the next five years as part of the GA4GH Connect initiative. Like the current roadmap, the documents to come will offer additional details of the organization's development work with the community as well as offer comments on the most pressing standards and policy needs in genomic data sharing.
The GA4GH plans to expand the list of driver projects that are contributing to its activities and to that end, it has tapped one of its executive group members to lead partner engagement efforts. Goodhand estimates that are there more than 100 potential driver projects globally, including the 15 currently working with the GA4GH, that could contribute meaningfully to the GA4GH's efforts. "We wanted to digest the first 15 and make that work … and then over time, we'll be adding a more diverse group of driver projects [that include] disease and geographic diversity."