Private Exploration Primitives for Data Cleaning

نویسندگان

  • Chang Ge
  • Ihab F. Ilyas
  • Xi He
  • Ashwin Machanavajjhala
چکیده

Data cleaning is the process of detecting and repairing inaccurate or corrupt records in the data. Data cleaning is inherently human-driven and state of the art systems assume cleaning experts can access the data to tune the cleaning process. However, in sensitive datasets, like electronic medical records, privacy constraints disallow unfettered access to the data. To address this challenge, we propose an utility-aware differentially private framework which allows data cleaner to query on the private data for a given cleaning task, while the data owner can track privacy loss over these queries. In this paper, we first identify a set of primitives based on counting queries for general data cleaning tasks and show that even with some errors, these cleaning tasks can be completed with reasonably good quality. We also design a privacy engine which translates the accuracy requirement per query specified by data cleaner to a differential privacy loss parameter and ensures all queries are answered under differential privacy. With extensive experiments using blocking and matching as examples, we demonstrate that our approach is able to achieve plausible cleaning quality and outperforms prior approaches to cleaning private data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Domain Independent Platform for Data Cleaning

We present a domain independent platform for data cleaning developed as part of the Data Cleaning project at Microsoft Research. Our platform consists of a set of core primitives and design tools that allow a programmer to develop sophisticated data cleaning solutions with minimal programming effort. Our primitives are designed to allow rich domain and application specific customizations and ca...

متن کامل

Declarative Cleaning, Analysis, and Querying of Graph-structured Data

Title of dissertation: DECLARATIVE CLEANING, ANALYSIS, AND QUERYING OF GRAPH-STRUCTURED DATA Walaa Eldin Moustafa, Doctor of Philosophy, 2013 Dissertation directed by: Professor Amol Deshpande, Professor Lise Getoor, Department of Computer Science Much of today’s data including social, biological, sensor, computer, and transportation network data is naturally modeled and represented by graphs. ...

متن کامل

An Exploration of Teachers' Beliefs about the Role of Grammar in Iranian High Schools and Private Language Institutes

This study was an attempt to explore the beliefs of Iranian EFL teachers about the role of grammar in English language teaching in both state schools and private language institutes. Data were collected through a questionnaire developed by Burgess and Etherington (2002), which consisted of 11 main subscales and was divided into two sections. The first section dealt with approaches to grammar te...

متن کامل

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

Lightweight 4x4 MDS Matrices for Hardware-Oriented Cryptographic Primitives

Linear diffusion layer is an important part of lightweight block ciphers and hash functions. This paper presents an efficient class of lightweight 4x4 MDS matrices such that the implementation cost of them and their corresponding inverses are equal. The main target of the paper is hardware oriented cryptographic primitives and the implementation cost is measured in terms of the required number ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.10266  شماره 

صفحات  -

تاریخ انتشار 2017