NEW YORK (GenomeWeb) – The National Institutes of Health announced today that it has awarded $9 million in fiscal 2017 to fund 12 projects kicking off the initial phase of an effort to build a cloud-based platform for storing, sharing, accessing, and computing biomedical data, and other digital objects such as analytical tools.
Called the NIH Data Commons Pilot Phase, the program is part of the agency's Big Data to Knowledge (BD2K) initiative, and aims to make biomedical research data findable, accessible, interoperable, and reusable (FAIR) for researchers. Earlier this year, the NIH announced that it was looking for participants for the four-year pilot phase, which will involve three high-value datasets that serve as test cases for the principles, policies, processes, and architectures that need to be developed for the BD2K program.
The NIH has now selected nine groups of collaborators from industry and academia to form the NIH Data Commons Pilot Phase consortium and begin developing the capabilities required for the planned data commons, including making data transparent and interoperable, safe-guarding patient data, and getting community buy-in for data standards.
According to the NIH, the funding recipients include the University of Maryland and the University of Oxford e-Research Center, which are working together to create the NIH Data Commons Facilitation Center; an Icahn School of Medicine-led group, which is creating a plan for the development and implementation of community-supported FAIR guidelines and metrics; Harvard Medical School, which is building a patient-centric information commons under FAIR principles; a University of Chicago-led team, which is building a platform for continuous FAIRness; a partnership between the University of California, Davis and Curoverse Innovations to develop tools and workflows for mining genomic data on many clouds; and collaborators led by the University of California, San Diego, who are building a cloud-agnostic architecture for locating indexed FAIR objects and safely reuse them in new integrated analyses.
The University of California, Santa Cruz has partnered with the Broad Institute and the University of Chicago to build a platform that can handle a heterogeneous mix of data types including genomics, transcriptomics, and image data, along with associated metadata. According to a UCSC press release, their partnership — dubbed the Commons Alliance — is planning to build a platform designed to handle a heterogeneous mix of data types, including genomics, transcriptomics, and image data, along with associated metadata. Their ultimate goal is to build a set of common software modules for creating interoperable systems, which could all reside within a common cloud-based research environment.
Seven Bridges Genomics also announced today that it will lead a group under the Data Commons Pilot, working with Repositive, Elsevier, and the Boston Veterans Affair Research Institute on a project called FAIR4CURES to build a full-stack solution that unifies data from a variety of research environments into a single ecosystem. The group will create interoperable APIs to connect biomedical data from the Cancer Genomics Cloud and Gabriella Miller Kids First Data Center to additional NIH datasets such as the Trans-Omics for Precision Medicine, Genotype-Tissue Expression, and the Model Organism Databases datasets, and will contribute access to additional data from Repositive's platform, Elsevier's Mendeley data hub, and the VA's GenHub Ecosystem.
And the Jackson Laboratory, which is part of a larger group of institutions participating in a project entitled A Collaboration for the NIH Data Commons, announced today that its particular contribution will include software specifically focusing on cardiomyopathy. This new online Disease Navigator will enable researchers who study cardiovascular disease to fast-track their research by accessing relevant genomic and other data from animal models cross-referenced to human data. The Disease Navigator will be developed in conjunction with a consortium of model organism databases called the Alliance of Genome Resources, the Jackson Lab said.
Additionally, the NIH has awarded supplemental grant funding to the stewards of the three dataset being used as test cases — the Broad Institute, Stanford University, and collaborators at the University of Michigan and the University of Washington — to facilitate their participation in the pilot program.
The National Heart, Lung, and Blood Institute has also contributed additional funding to the projects led by the University of North Carolina at Chapel Hill, Seven Bridges, Harvard, and UC Santa Cruz.
"Harvesting the wealth of information in biomedical data will advance our understanding of human health and disease," NIH Director Francis Collins said in a statement. "However, poor data accessibility is a major barrier to translating data into understanding. The NIH Data Commons Pilot Phase is an important effort to remove that barrier."