K−means Clustering Microaggregation for Statistical Disclosure Control
نویسندگان
چکیده
This paper presents a K-means clustering technique that satisfies the biobjective function to minimize the information loss and maintain k-anonymity. The proposed technique starts with one cluster and subsequently partitions the dataset into two or more clusters such that the total information loss across all clusters is the least, while satisfying the k-anonymity requirement. The structure of K−means clustering problem is defined and investigated and an algorithm of the proposed problem is developed. The performance of the K− means clustering algorithm is compared against the most recent microaggregation methods. Experimental results show that K−means clustering algorithm incurs less information loss than the latest microaggregation methods for all of the test situations.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملPractical Data-Oriented Microaggregation for Statistical Disclosure Control
ÐMicroaggregation is a statistical disclosure control technique for microdata disseminated in statistical databases. Raw microdata (i.e., individual records or data vectors) are grouped into small aggregates prior to publication. Each aggregate should contain at least k data vectors to prevent disclosure of individual information, where k is a constant value preset by the data protector. No exa...
متن کاملRecord Ordering Heuristics for Disclosure Control through Microaggregation
Statistical disclosure control (SDC) methods reconcile the need to release information to researchers with the need to protect privacy of individual records. Microaggregation is a SDC method that protects data subjects by guarantying k-anonymity: Records are partitioned into groups of size at least k and actual data values are replaced by the group means so that each record in the group is indi...
متن کاملNovel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control
In recent years, there has been an alarming increase of online identity theft and attacks using personally identifiable information. The goal of privacy preservation is to de-associate individuals from sensitive or microdata information. Microaggregation techniques seeks to protect microdata in such a way that can be published and mined without providing any private information that can be link...
متن کاملMicrodata Protection Method Through Microaggregation: A Systematic Approach
Microdata protection in statistical databases has recently become a major societal concern and has been intensively studied in recent years. Statistical Disclosure Control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a family of methods to protect microdata from individual identification. SDC seeks to protect microdata in s...
متن کامل