Record Ordering Heuristics for Disclosure Control through Microaggregation
نویسندگان
چکیده
Statistical disclosure control (SDC) methods reconcile the need to release information to researchers with the need to protect privacy of individual records. Microaggregation is a SDC method that protects data subjects by guarantying k-anonymity: Records are partitioned into groups of size at least k and actual data values are replaced by the group means so that each record in the group is indistinguishable from at least k-1 other records. The goal is to create groups of similar records such that information loss due to data modification is minimized, where information loss is measured by the sum of squared deviations between the actual data values and their group means. Since optimal multivariate microaggregation is NP-hard, heuristics have been developed for microaggregation. It has been shown that for a given ordering of records, the optimal partition consistent with that ordering can be efficiently computed and some of the best existing microaggregation methods are based on this approach. This paper improves on previous heuristics by adapting tour construction and tour improvement heuristics for the traveling salesman problem (TSP) for microaggregation. Specifically, the Greedy heuristic and the Quick Boruvka heuristic are investigated for tour construction and the 2-opt, 3-opt, and Lin-Kernighan heuristics are used for tour improvements. Computational experiments using benchmark datasets indicate that our method results in lower information loss than extant microaggregation heuristics.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملA Comparative Study on Microaggregation Techniques for Microdata Protection
Microaggregation is an efficient Statistical Disclosure Control (SDC) perturbative technique for microdata protection. It is a unified approach and naturally satisfies k-Anonymity without generalization or suppression of data. Various microaggregation techniques: fixed-size and data-oriented for univariate and multivariate data exists in the literature. These methods have been evaluated using t...
متن کاملMicrodata Protection Through Approximate Microaggregation
Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is kanonymity, introduced by Samara...
متن کاملA polynomial-time approximation to optimal multivariate microaggregation
Microaggregation is a family of methods for statistical disclosure control (SDC) of microdata (records on individuals and/or companies), that is, for masking microdata so that they can be released without disclosing private information on the underlying individuals. Microaggregation techniques are currently being used by many statistical agencies. The principle of microaggregation is to group o...
متن کاملOn the Complexity of Optimal Microaggregation for Statistical Disclosure Control
Statistical disclosure control (SDC), also termed inference control two decades ago, is an integral part of data security dealing with the protection of statistical databases. The basic problem in SDC is to release data in a way that does not lead to disclosure of individual information (high security) but preserves the informational content as much as possible (low information loss). SDC is du...
متن کامل