Harvard University researchers were able to correctly identify between 84 percent and 97 percent of Personal Genome Project participants through demographic information they provided.
The Personal Genome Project, led by George Church at Harvard Medical School, began about a half dozen years ago to examine the interactions among genotype, the environment, and phenotype. Project participants provide genomic information as well as health information and some other personal data. The project notes that privacy cannot be guaranteed, though the online profiles do not include names or addresses.
Latanya Sweeney and her colleagues write in a paper, available at the arXiv preprint server, that they matched demographic data— such as ZIP code, age, and gender — from 579 of 1,130 public PGP profiles to data housed in voter or other public records. From this, they received 241 unique matches.
The researchers shared their findings with the PGP to determine the accuracy of their matches. According to the response from the PGP, 84 percent of Sweeney and her team's matches were correct, and, the researchers note, if nicknames were taken into consideration, that proportion of correct matches increases to 97 percent. "Our ability to learn their names is based on their demographics, not their DNA, thereby revisiting an old vulnerability that could be easily thwarted with minimal loss of research value," they write.
Sweeney and her colleagues add that by making age and location data a bit fuzzier, PGP participants would be able to protect themselves from identification. They note that they developed a tool to help PGP participants make such changes.
"That should make the Personal Genome Project significantly more private for those who choose this option. It should also serve as a warning for future projects involving personal data that privacy isn't always as easy to protect as it might at first seem," the Physics arXiv blog adds.