NEW YORK (GenomeWeb) – Nearly a year after opening a comment period on a draft policy proposal for how researchers should share data from human and non-human genomics research projects, the National Institutes of Health today released its finalized policy on the issue.
The new policy, which goes into effect early next year, builds on and replaces the NIH's initial Genomic Data Sharing policy, which was issued in 2007 in an effort to promote the sharing of data from genome-wide association studies, and through the creation of the database of Genotypes and Phenotypes (dbGaP). With dbGaP, the NIH created a two-tiered system for distributing data — an open-access level with no restrictions and a controlled-access level for data that can be used only for research purposes, and which are consistent with the original informed consent under which the data were collected.
A key part of the new policy is focused on expectations that researchers obtain informed consent from study participants for the potential future use of their de-identified data for research and for broad sharing. The NIH also has similar expectations for studies involving the use of de-identified cell lines or clinical specimens.
The previously existing two-tiered system that provided access to human data based on data sensitivity and privacy concerns continues under the new policy, according to the NIH. In addition, for controlled-access data researchers will be expected to use data only for the approved research, protect data confidentiality — which includes not sharing the data with unauthorized people, and acknowledge data-submitting investigators in presentations and publications.
Institutions submitting data to dbGAP are expected to certify that the data were collected legally and ethically, and personal identifiers, such as names and addresses, have been removed. Study investigators and their institutions also are expected to provide basic plans for following the GDS policy as part of their funding proposals and applications.
Under the new GDS policy, researchers are encouraged to "seek the broadest possible sharing permissions" from study participants for future use of their data, the NIH said.
It also said data submissions and access should promote timely and broad data sharing, and non-human genomic data should be made publicly available no later than the date of initial publication. The NIH noted that non-human genomic data can be deposited into NIH-designated repositories other than dbGAP.
The new policy also encourages "the broadest possible use of findings and development of products/technologies from the use of NIH-funded genomic data to promote maximum public benefit," said the NIH.
Starting with funding applications submitted for a Jan. 25, 2015 receipt date, the new GDS policy will apply to all NIH-funded, large-scale human and non-human projects that generate genomic data.
"Advances in DNA sequencing technologies have enabled NIH to conduct and fund research that generates ever-greater volumes of GWAS and other types of genomic data," Eric Green, NHGRI director and a co-chair of the trans-NIH committee that developed the GDS policy, said in a statement. "Access to these data through dbGaP and according to the data management practices laid out in the policy allows researchers to accelerate research by combining and comparing large and information-rich datasets."
Members of the NIH Genomic Data Sharing policy team also authored a report published in Nature Genetics today providing statistics on usage of dbGAP under the previous GDS policy that focused on genome-wide association studies.
They analyzed data from 304 dbGAP studies deposited from the beginning of the GWAS GDS policy six years ago through Dec. 1, 2013.
More than 2,200 investigators and around 6,800 collaborators from 41 different countries, though primarily from North America, received dbGAP data during that time frame. In all, 17,746 requests were submitted, of which 12,391, or 69 percent, were approved. They noted that the most common reason for not approving a request was inconsistency between the proposed research and the data use limitations of the requested data set.
The authors also noted that, although rare, several violations of the GDS policy occurred during the six-year period. "These involved errors in assigning data use limitations during data submission, investigators sharing controlled-access data with unapproved investigators, and investigators using data for purposes not described in the research use statement," they wrote, adding that the NIH took appropriate steps to address the violation as soon as it learned of them. "Fortunately, to our knowledge, no participants were harmed," they said.
The authors also noted that with the increasing volume and complexity of genomic data, there is an "urgent need" for alternative data management and analysis mechanisms beyond dbGAP. They cited The Cancer Genome Atlas program, under which the NIH allowed external organizations to serve as "trusted partners" for data management through a contract. The Cancer Genomics Hub is an example of this model.
"Several pilot programs are underway to explore secure, cloud-based systems for managing human-derived data," they said. "The outcome of these pilots will be considered by NIH leadership in advancing data-sharing policies and other data science initiatives, such as the Big Data to Knowledge initiative."