Project

Privacy preserving core sets

Empirical risk minimization lies at the core of modern machine learning methodologies. In this framework, a loss function is
minimized in expectation, where the expectation is taken with respect to data representative of that in the real world. In diverse
applications, it is helpful to obtain a summary of a set of data that preserves important aspects of the distribution, while
removing private data that allows an adversary to recover individual identifying information. The aspects of the distribution to be
preserved are those that allow for good generalization, while the transformed representation of the data should often be
compact to save resources. In this project, we will explore the use of privacy-preserving core sets. A core set is a sparse
representation of a larger dataset that preserves the performance of a machine learning algorithm. As such, there may be an
intrinsic privacy-preserving property of core set construction, but there is still room for data leakage even from points not
included in the core set due to the way that the points are selected. This project will develop core sets that simultaneously
guarantee good statistical learning, while also providing differential privacy guarantees. In doing so, we will improve the
efficiency and performance of machine learning algorithms, while ensuring the safety of our personal data used in their training.

Date:1 Apr 2021 → Today

Keywords:privacy-preserving core sets, machine learning algorithms

Disciplines:Machine learning and decision making

Project

Privacy preserving core sets

Researchers

Project partners

Funding