Privacy Preserving Linear Regression on Distributed Databases

نویسنده

  • Fida Kamal Dankar
چکیده

Studies that combine data from multiple sources can tremendously improve the outcome of the statistical analysis. However, combining data from these various sources for analysis poses privacy risks. A number of protocols have been proposed in the literature to address the privacy concerns; however they do not fully deliver on either privacy or complexity. In this paper, we present a (theoretical) privacy preserving linear regression model for the analysis of data owned by several sources. The protocol uses a semi-trusted third party and delivers on privacy and complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy-Preserving Maximum Likelihood Estimation for Distributed Data

Recent technological advances enable the collection of huge amounts of data. Commonly, these data are generated, stored, and owned by multiple entities that are unwilling to cede control of their data. This distributed environment requires statistical tools that can produce correct results while preserving data privacy. Privacy-preserving protocols have been proposed to solve specific statistic...

متن کامل

Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis

To build reliable prediction models and identify useful patterns, assembling data sets from databases maintained by different sources such as hospitals becomes increasingly common; however, it might divulge sensitive information about individuals and thus leads to increased concerns about privacy, which in turn prevents different parties from sharing information. Privacy Preserving Distributed ...

متن کامل

Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model...

متن کامل

Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases

Mining distributed databases is emerging as a fundamental computational problem. A common approach for mining distributed databases is to move all of the data from each database to a central site and a single model is built. This approach is accurate, but too expensive in terms of time required. For this reason, several approaches were developed to efficiently mine distributed databases, but th...

متن کامل

"Secure" Log-Linear and Logistic Regression Analysis of Distributed Databases

The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Trans. Data Privacy

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015