Microaggregation- and Permutation-Based Anonymization of Mobility Data
نویسندگان
چکیده
Movement data, that is, trajectories of mobile objects, are automatically collected in huge quantities by technologies such as GPS, GSM or RFID, among others. Publishing and exploiting such data is essential to improve transportation, to understand the dynamics of the economy in a region, etc. However, there are obvious threats to the privacy of individuals if their trajectories are published in a way which allows re-identification of the individual behind a trajectory. We contribute to the literature on privacy-preserving publication of trajectories by presenting a distance measure for trajectories which naturally considers both spatial and temporal aspects of trajectories, is computable in polynomial time, and can cluster trajectories not defined over the same time span. Our distance measure can be naturally instantiated using other existing similarity measures for trajectories that are appropriate for anonymization purposes. Then, we propose two heuristics for trajectory anonymization which yield anonymized trajectories formed by fully accurate true original locations. The first heuristic is based on trajectory microaggregation using the above distance and on location permutation; it effectively achieves trajectory k-anonymity. The second heuristic is based only on location permutation; it gives up trajectory k-anonymity and aims at location k-diversity. The strong point of the second heuristic is that it takes into account reachability constraints when computing anonymized trajectories. Experimental results on a synthetic data set and a real-life data set are presented; for similar privacy protection levels and most reasonable parameter choices, our two methods offer better utility than comparable previous proposals in the literature.
منابع مشابه
Disclosure Control by Computer Scientists: An Overview and an Application of Microaggregation to Mobility Data Anonymization
Privacy-preserving data mining (PPDM) is a subdiscipline of computer science which in many respects is parallel to statistical disclosure control (SDC) within statistics. See [12] for a survey of recent developments in PPDM. We focus here on the connections between k-anonymity, a concept arisen in the PPDM community, and microaggregation, a family of methods developed within SDC. This is discus...
متن کاملAnonymization of Trajectory Data
Trajectories of mobile objects, are automatically collected in huge quantities. Publishing and exploiting such data is essential to improve planning, but it threatens the privacy of individuals: re-identification of the individual behind a trajectory is easy unless precautions are taken. We present two heuristics for privacy-preserving publication of trajectories. Both of them publish only true...
متن کاملBeyond Multivariate Microaggregation for Large Record Anonymization
Microaggregation is one of the most commonly employed microdata protection methods. The basic idea of microaggregation is to anonymize data by aggregating original records into small groups of at least k elements and, therefore, preserving k-anonymity. Usually, in order to avoid information loss, when records are large, i.e., the number of attributes of the data set is large, this data set is s...
متن کاملUtility preserving query log anonymization via semantic microaggregation
Query logs are of great interest for scientists and companies for research, statistical and commercial purposes. However, the availability of query logs for secondary uses raises privacy issues since they allow the identification and/or revelation of sensitive information about individual users. Hence, query anonymization is crucial to avoid identity disclosure. To enable the publication of pri...
متن کاملOptimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation
Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is wit...
متن کامل