< Back to previous page

Project

Bringing dataset anonymization to practice

Data collection and data processing have become a part of the core business in many companies. Strategic decisions are increasingly made based on the collected data. Many companies could benefit of enriching their models with data from other providers (academic, governmental, third parties…). Moreover, selling collected data to third parties can generate extra revenues. The GDPR regulation, in effect since May 2018, supports data exchange between entities, provided that the data is anonymized. Many papers describe privacy metrics. Examples are K-anonymity and L-diversity. Privacy metrics only take into account the specific dataset that is being anonymized when describing the privacy of an anonymized dataset. In the real world, background knowledge can be merged with the anonymized data to gain more information than anticipated. Background knowledge is not restricted to other datasets (governmental citizen data, other data sets released by other commercial partners…), but also applies to different anonymized versions of the same datasets, or even the same dataset at a different point in time. Several anecdotal privacy incidents are described in academic research (prototypical cases are Netflix, credit card data and New York taxi data), but approaches to tackle the challenges are rarely proposed. The goal of this PhD is to bring in practice the real privacy of (anonymized) datasets. The impact of different de-anonymization techniques on the privacy properties of anonymized datasets is studied and further extended upon. Attacks are demonstrated on real data sets. In a further stage of the PhD, the insights gained from performing deanonymization attacks are applied to enhance the existing anonymization techniques.

Date:16 Dec 2020 →  Today
Keywords:Privacy, Anonymization, Anonymity, GDPR, Datasets
Disciplines:Cryptography, privacy and security
Project type:PhD project