Semantic attack on transaction data anonymised by set-based generalisation

نویسنده

  • Hoang Ong
چکیده

Publishing data that contains information about individuals may lead to privacy breaches. However, data publishing is useful to support research and analysis. Therefore, privacy protection in data publishing becomes important and has received much recent attention. To improve privacy protection, many researchers have investigated how secure the published data is by designing de-anonymisation methods to attack anonymised data. Most of the de-anonymisation methods consider anonymised data in a syntactic manner. That is, items in a dataset are considered to be contextless or even meaningless literals, and they have not considered the semantics of these data items. In this thesis, we investigate how secure the anonymised data is under attacks that use semantic information. More specifically, we propose a de-anonymisation method to attack transaction data anonymised by set-based generalisation. Set-based generalisation protects data by replacing one item by a set of items, so that the identity of an individual can be hidden. Our goal is to identify those items that are added to a transaction during generalisation. Our attacking method has two components: scoring and elimination. Scoring measures semantic relationship between items in a transaction, and elimination removes items that are deemed not to be in the original transaction. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 70% of the items added to the transactions during generalisation can be detected by our method with a precision

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Priority-Based k-Anonymity Accomplished by Weighted Generalisation Structures

Biobanks are gaining in importance by storing large collections of patient’s clinical data (e.g. disease history, laboratory parameters, diagnosis, life style) together with biological materials such as tissue samples, blood or other body fluids. When releasing these patientspecific data for medical studies privacy protection has to be guaranteed for ethical and legal reasons. k-anonymity may b...

متن کامل

Decision support for releasing anonymised data

For legal and privacy reasons it is often prescribed that data bases containing sensitive personal data can be published only in anonymised form. History shows, however, that the privacy of anonymised data in many cases is easily broken by de-anonymisation attacks. This paper defines guiding principles for decisions about releasing anonymised data and provides a simple process for analysing de-...

متن کامل

Semantic role labelling with similarity-based generalization using EM-based clustering

We describe a system for semantic role assignment built as part of the Senseval III task, based on an off-the-shelf parser and Maxent and Memory-Based learners. We focus on generalisation using several similarity measures to increase the amount of training data available and on the use of EM-based clustering to improve role assignment. Our final score is Precision=73.6%, Recall=59.4% (F=65.7).

متن کامل

SEIMCHA: a new semantic image CAPTCHA using geometric transformations

As protection of web applications are getting more and more important every day, CAPTCHAs are facing booming attention both by users and designers. Nowadays, it is well accepted that using visual concepts enhance security and usability of CAPTCHAs. There exist few major different ideas for designing image CAPTCHAs. Some methods apply a set of modifications such as rotations to the original imag...

متن کامل

A Semantic-Based Transaction Model for Active Heterogeneous Database Systems

This paper presents a framework to process the transactions under the active heterogeneous database systems. In order to perform a correct schedule of transactions with high performance, the framework provides a semantic-based concurrency control on the global level. It relaxes the correct criterion (global serializability) to allow the global sub-transactions on each site to execute in differe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015