Skip to main content
Premium Trial:

Request an Annual Quote

Forensic Breakthrough Stirs NIH to Close GWAS Data from Public View

NEW YORK (GenomeWeb News) – Large amounts of aggregate human DNA data that the National Institutes of Health and other groups made open to researchers around the world is being locked up from public view due to privacy concerns that arose this week when a new forensic DNA method was announced that could conceivably leave people vulnerable to identification.
 
Until now, there was little concern that information from large public databases of information from genome-wide association studies could be used to identify singular individuals out of the thousands who gave samples.
 
But a study released this week by the Translational Genomics Research Institute and the University of California, Los Angeles, which is aimed at helping crime solvers identify one person from among many potentially contaminated DNA samples at a crime scene, spurred NIH to close down their publicly available GWAS databases.
 
NIH said today that on Aug. 25 it removed aggregate statistics files of individual GWAS studies, including the Database of Genotypes and Phenotypes (dbGaP), run by the National Center for Biotechnology Information, and the Caner Genetic Markers of Susceptibility database, run by the National Cancer Institute.
 
That data is still available for use by researchers who apply for access to the data and agree to protect its confidentiality using the same approach they do for individual-level study data.
 
NIH also confirmed that other groups, including the Wellcome Trust Case Control Consortium, and the Broad Institute of MIT and Harvard, that have been hosting such public datasets, also have removed the aggregate data from public availability.
 
The TGen and UCLA research shows that it’s possible to use an algorithm and Affymetrix or Illumina microarrays “to find an individual of interest in a mixture of hundreds or even thousands of people’s DNA,” as was reported earlier today in GenomeWeb Daily News. Among the potential real-world applications, the researchers noted that this technique holds particular promise for forensics investigations, since it opens the door to analyzing contaminated DNA samples or sampling a large crime scene area.
 
“Similarly, it might provide closure to families of victims of mass disaster if individual DNA profiles could be identified from mixed samples,” Kathy Hudson, director of the Genetics and Public Policy Center at Johns Hopkins University, said today in an e-mail to GenomeWeb Daily News. “Those are both two great contributions of genome science to society,” Hudson added.
 
This method also could potentially be used to identify individuals from a GWAS-style aggregate database, although, TGen researcher David Craig told GWDN today, it is unlikely any information could have been compromised so far.
 
To dig out one specific profile from within a set, the inquirer would need to have a “highly dense genomic profile” of at least 10,000 specific genetic variations from an individual. That profile of single nucleotide polymorphisms then would be compared against the dataset to measure its uniqueness.
 
However complex, NIH admitted that this new analytical tool goes beyond prior expectations, which held that individual profiles would need to be compared one against another to confirm a match, and that it is now possible to detect a single profile even in pooled data.
 
When Craig alerted NIH to the methods that TGen and UCLA were developing, NIH tested it out and then began fashioning its policy response and then notified the Broad Institute and the Wellcome Trust about the vulnerability. Craig noted that he worked with NIH in advance of the release of this study, and that they were preparing for how to handle their public databases.
 
“We knew the implications, and we worked with [NIH] for a while,” Craig said. NIH “took all of this very seriously, even though it really sounds kind of farfetched,” he explained. “They went down a very pre-emptive path.”
 
NIH and other groups conducting GWA studies know that one of the core ethical components of their work, and a critical element of convincing people to participate in these studies, is offering the closest guarantee possible that their personal medical and genomic information will not be compromised. As genomics researchers launch major pushes to try to recruit new people to join GWA studies, they will want assurances that no one will pinpoint them and misuse their information in unethical or harmful ways.
 
GPPC’s Kathy Hudson described one of the ethical concerns in her e-mail: “So, the unlikely but concerning scenario is that law enforcement has a DNA sample from a crime scene, searches an NIH database, finds a match and gets a subpoena to identify what researcher provided the cohort data.
 
“While a fairly remote concern, and there are some protections even against subpoena, NIH did the right thing in acting to protect research participants,” she wrote.
 
NIH said today that it is “unaware that [this technique] has been used to compromise any information within NIH GWAS datasets,” and added that the genomics tools required are “not commonly used outside of the research community.”
 
Further, NIH said, “even if an individual’s SNP profile was found within a pooled dataset, all that would be learned is that this profile was contained in the dataset and, thus, it could then be associated with the characteristics of that dataset (e.g., disease or control population).”
 
That is not a concern because the NIH’s GWAS databases do not contain names or other such identifiable information about participants.
 
“The confidence level in the system is very high,” said Laura Rodriguez, who is acting director of the National Human Genome Research Institute’s Office of Policy Communications and Education.
 
Under a new policy adopted earlier this year, which now will cover the aggregate data from GWA studies, “Access is granted for a specific research purpose, for a specific data set, for a specific period of time.”
 
When the news came up that this new method was possible, Rodriguez said, NIH wanted to respond very prudently. “Our goal is to protect confidentiality of the data, and that’s why we took the cautious step,” she said.
 
Rodriguez expects that in the long run this new research will not steer individuals away from participating in GWA studies, and she said that NIH expects to continue to provide access to this data, if through a more secure process.
 
She added that NIH is still considering how it will handle these types of databases in the future, and she admitted “this issue is broader than dbGaP now,” and that NIH is “aware of the bigger picture. She also admits that they were taken by surprise.
 
“People really didn’t think this could be true,” she said.
 
 

The Scan

Two J&J Doses

Johnson & Johnson says two doses of its SARS-CoV-2 vaccine provides increased protection against symptomatic COVID-19, CNN reports.

Pfizer-BioNTech Vaccine Response in Kids

The Pfizer-BioNTech SARS-CoV-2 vaccine in a lower-dose format appears to generate an immune response among children, according to the Washington Post.

Chicken Changes to Prevent Disease

The Guardian writes that researchers are looking at gene editing chickens to help prevent future pandemics.

PNAS Papers on Siberian Dog Ancestry, Insect Reproduction, Hippocampal Neurogenesis

In PNAS this week: ancestry and admixture among Siberian dogs, hormone role in fruit fly reproduction, and more.