People can easily be re-identified from data that has been anonymized, Technology Review reports.
Researchers from Imperial College London and the Catholic University of Louvain developed a model using data from five publicly available sources, including the US Census, that showed people could be re-identified from anonymized data with high accuracy, as they report this week in Nature Communications. In particular, they found that 81 percent of Americans could be identified from an anonymized dataset based on the three demographic data points of zip code, gender, and date of birth, and 99.98 percent of Massachusetts residents could be identified based on 15 demographic data points.
"Looking at a dataset — there are a lot of people who are in their 30s, male, and living in New York City," Imperial College London's Yves-Alexandre de Montjoye tells New Scientist in a Q&A. "So it might not be me that you have re-identified. However, if I also know the person I'm searching for was born on January 5, is driving a red Mazda, has two kids, both of them are girls, has one dog, is living in a specific borough in New York City, then I have a pretty good chance to have identified the right person."
This, de Montjoye tells Tech Review, shows even anonymized data isn't safe. He adds at New Scientist that there are new cryptographic techniques that could be employed.