Skip to main content
Premium Trial:

Request an Annual Quote

Family Ties Can Compromise Genomic Data Privacy, New Studies Suggest

NEW YORK (GenomeWeb) – Two papers published today in Science and Cell are providing additional evidence that an individual's genomic data can reveal that person's family relationships when run through databases accessible to both law enforcement organizations and the general public.

In Science, researchers led by Yaniv Erlich of Columbia University and the New York Genome Center showed how databases used in genetic genealogy can reveal family relationships for many Americans with European ancestry. They analyzed 1.2 million individuals who had undergone genotyping by consumer genomics company My Heritage, where Erlich serves as chief scientific officer.

"Our results show that nearly 60 percent of long-range familial searches return a relative," who is a third-cousin — people who share a great-great-grandparent — or closer, the authors wrote.

In Cell, researchers led by Stanford University professor Noah Rosenberg described an algorithm that allowed them to identify parent-child and sibling relationships of individuals whose genomic data were discordant. Chiefly, they matched individuals where one person's data was available from SNP genotyping, which is commonly used in consumer genomics, and the other was available as a short tandem repeat (STR) profile, which is used in DNA databases run by law enforcement.

In a simulated data set of 218 SNP and 218 STR profiles and where each profile had a "match" to a profile of the other data type, the researchers were able to identify up to 32 percent of parent-offspring pairs and up to 36 percent of sibling pairs.

"The two papers speak about the issues of genetic privacy and convergence of consumer databases and forensic work," Erlich told GenomeWeb. Researchers from both teams also said that their papers had implications for genomic data used in research.

Law enforcement operations have made headlines this year for using genetic genealogy to identify suspects in high-profile cold cases. In April, officials in California arrested a man alleged to be the serial murderer and rapist dubbed the Golden State Killer. And in May, officials in Washington State announced an arrest for a double murder from 1987.

The power of genomic data to help identify individuals has been demonstrated in previous work by each of the research groups that published today's Science and Cell studies. In 2013, Erlich published a paper in Science that demonstrated that STR profiling could help lead investigators to a family surname, which — when used with other publicly available data — could be used to identify people.

And last year, researchers led by Stanford's Rosenberg published a paper in the Proceedings of the National Academy of Sciences detailing how they were able to match SNP profiles with STR profiles of the same individual. This method used linkage disequilibrium between the two genetic marker types to find markers that were often inherited together. Thus, they were able to harmonize the data types and use them for identification.

As forensic DNA methods inch towards next-generation sequencing-based technologies, this kind of research is adding to the evidence that shows how SNP genotyping has changed what's possible in forensic genetics.

"With SNPs, everything changes," said Michael Edge, a postdoc at the University of California, Davis and a co-author of the Cell paper. "In [Erlich's paper], they're talking about how using these [SNP profiles], you can ID these long distance familial relationships. Ours is about connecting the old sources of info to the new ones."

Soon, "nearly every person in the US with European heritage" could be identified using genetic data, Erlich added. "Even if a specific individual is not in these databases, a relative of theirs could be, which is enough to identify them."

In the paper, his team suggested that only 2 percent of a target population needed to be in a given database to find a third cousin of the person of interest. "If you have the genealogical records, that is easy to trace back," Erlich noted.

And it's not just suspects in law enforcement investigations that could be identified. "Research subjects could be identified using this same strategy," he said. In the paper, the authors argued that consumer genomics companies should band together to develop cryptographic measures to help control the risk of de-anonymization of records.

The Cell paper's authors concurred, noting that their paper also had implications for both forensics and genomic research.

"It's important for the public to be aware that information between these two types of genetic data can be connected, often in unexpected ways," Rosenberg said in a statement. "There's a legacy problem in that so many DNA profiles have been collected with this older genetic marker system that's been used by law enforcement since the 1990s. The system is not designed for the more challenging queries that are currently of interest, such as identifying people represented in a DNA mixture or identifying relatives of the contributor of a DNA sample."

Edge noted that the algorithm could provide backward compatibility to older genetic data sets. "We're talking about STRs and SNPs, but nothing about the algorithm is specific to those" data types, he said. "Maybe you have SNPs and SNPs, but they’re different. Or maybe you have an old, sparse SNP set, but you have the same people in your study. You could use the algorithm in that situation too."