Skip to main content
Premium Trial:

Request an Annual Quote

Research Exposes Vulnerabilities in Security of Public Genomic Databases

NEW YORK (GenomeWeb News) – A team of researchers from the Whitehead Institute and their collaborators have published a study in Science that shows it is possible to identify research study participants from de-identified genetic data.

The researchers identified nearly 50 men and women who had submitted samples and had their genomes sequenced for a study done by the Center for the Study of Human Polymorphisms (CEPH).

According to the paper, the researchers identified the participants by matching short tandem repeats they found on the Y chromosomes of men in the CEPH study to Y-STRs in publicly available genetic genealogy databases. Because Y-STRs are linked to surnames in genetic genealogy databases, the researchers were able to recover the family names of men in the CEPH dataset who had submitted their Y-STRS to these repositories.

Armed with this information, they searched other free online information sources including record search engines, obituaries, genealogy websites, and public demographic data from the National Institute of General Medical Sciences' Human Genetic Cell Repository, which is housed at the Coriell Institute, and were able to track down the participants.

In a statement, Whitehead Fellow Yaniv Erlich, who led the research team, said that while the findings reveal "the potential for breaches of privacy in genomics studies" he does not wish to see public sharing of data curtailed.

He said the researchers' intent was to spark community-wide dialogue as well as better educate study participants about the risks of sharing their genetic information.

"Our aim is to better illuminate the current status of identifiability of genetic data," he said. "More knowledge empowers participants to weigh the risks and benefits and make more informed decisions when considering whether to share their own data. We also hope that this study will eventually result in better security algorithms, better policy guidelines, and better legislation to help mitigate some of the risks described."

The names of the individuals identified by the study are not being released.

Prior to publication, the researchers shard their findings with officials at the National Human Genome Research Institute and the National Institute of General Medical Sciences. In response, the institutes relocated some demographic information from the publicly-accessible portion of the NIGMS cell repository to help reduce the risk of future breaches.

NHGRI and NIGMS officials also published a separate perspectives piece in the same issue of Science, in which they called for an examination of approaches to balance research participants' privacy rights with the societal benefits to be realized from the sharing of biomedical research data.

"The willingness of individuals and communities to assume some risk to participate in biomedical research depends on the scientific community's ability to maintain the public's trust," NHGRI Director Eric Green and his colleagues from the two institutes wrote. "The ultimate goal must be to develop a robust system that ensures full societal benefits of biomedical research while respecting both individual needs and the communal good."


For a more in-depth report on the study please see this article in GenomeWeb Daily News sister publication BioInform.

The Scan

Study Tracks Off-Target Gene Edits Linked to Epigenetic Features

Using machine learning, researchers characterize in BMC Genomics the potential off-target effects of 19 computed or experimentally determined epigenetic features during CRISPR-Cas9 editing.

Coronary Artery Disease Risk Loci, Candidate Genes Identified in GWAS Meta-Analysis

A GWAS in Nature Genetics of nearly 1.4 million coronary artery disease cases and controls focused in on more than 200 candidate causal genes, including the cell motility-related myosin gene MYO9B.

Multiple Sclerosis Contributors Found in Proteome-Wide Association Study

With a combination of genome-wide association and brain proteome data, researchers in the Annals of Clinical and Translational Neurology tracked down dozens of potential multiple sclerosis risk proteins.

Quality Improvement Study Compares Molecular Tumor Boards, Central Consensus Recommendations

With 50 simulated cancer cases, researchers in JAMA Network Open compared molecular tumor board recommendations with central consensus plans at a dozen centers in Japan.