De-identification methods help reduce the risk that patients whose information is housed in a database can be re-identified, according to a new study.
Researchers from the Cancer Registry of Norway applied both k-anonymization and a fuzzy factor to a set of nearly 5.7 million records from more than 911,000 women in the Norwegian Cervical Cancer Screening Program.
The researchers then challenged those de-identification approaches using the ARX tool under a "prosecutor scenario" in which the database attacker is assumed to know a bit about individuals in the dataset. As the researchers report today in Cancer Epidemiology, Biomarkers & Prevention, they found that simple steps could protect data. For instance, they found that replacing all date variables with 15 and adding a fuzzy factor to the months greatly reduced the risk of re-identification.
"We found that changing the dates using the standard procedure of k-anonymization drastically reduced the chances of re-identifying most individuals in the dataset," author Giske Ursin, the director of Cancer Registry of Norway, says in a statement.
She adds that people in charge of keeping sensitive data safe should also consider what information needs to be shared with other researchers to address their research questions and what can be held back. "[G]iven the recent trend in sharing data and combining datasets for big-data analyses — which is a good development — there is always a chance of information falling into the hands of someone with malicious intent," Ursin says.