Repeated Record Ordering for Constrained Size Clustering

author

Abstract:

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggregation technique, the algorithm has to divide the dataset into groups containing at least k members, where k is a user-defined parameter. The main application of microaggregation is in Statistical Disclosure Control (SDC) for privacy preserving data publishing. A microaggregation algorithm is qualified based on the sum of within-group squared error, SSE. Unfortunately, it is proved that the optimal microaggregation problem is NP-Hard in general, but the special case of univariate can be solved optimally in polynomial time. There exist many heuristics for the general case of the problem that are founded on the univariate case. These techniques have to order multivariate records in a sequence. This paper proposes a novel method for record ordering. Starting from a conventional clustering algorithm, the proposed method repeatedly puts multivariate records into a sequence and then clusters them again. The process is repeated until no improvement is achieved. Extensive experiments are carried out to confirm the effectiveness of the proposed method for different parameters and datasets.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Partitioning Complex Networks via Size-Constrained Clustering

The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferr...

full text

Size constrained clustering problems in fixed dimension

Clustering or cluster analysis [1] is a classical method in unsupervised learning and one of the most used techniques in statistical data analysis. Clustering has a wide range of applications in many areas like pattern recognition, medical diagnostics, data mining, biology, market research and image analysis among others. A cluster is a set of data points that in some sense are similar to each ...

full text

Constrained Ordering

We investigate the problem of finding a total order of a finite set that satisfies various local ordering constraints. Depending on the admitted constraints, we provide an efficient algorithm or prove NP-completeness. To this end, we define a reduction technique and discuss its properties.

full text

Size-constrained 2-clustering in the plane with Manhattan distance

We present an algorithm for the 2-clustering problem with cluster size constraints in the plane assuming `1-norm, that works in O(n logn) time and O(n) space. Such a procedure also solves a full version of the problem, computing the optimal solutions for all possible constraints on cluster sizes. The algorithm is based on a separation result concerning the clusters of any optimal solution of th...

full text

Evolving Variable-Ordering Heuristics for Constrained Optimisation

In this paper we present and evaluate an evolutionary approach for learning new constraint satisfaction algorithms, specifically for MAX-SAT optimisation problems. Our approach offers two significant advantages over existing methods: it allows the evolution of more complex combinations of heuristics, and; it can identify fruitful synergies among heuristics. Using four different classes of MAX-S...

full text

Parallel Genetic Algorithms for Constrained Ordering Problems

This paper proposes two different parallel genetic algorithms (PGAs) for constrained ordering problems. Constrained ordering problems are constraint optimization problems (COPs) for which it is possible represent a candidate solution as a permutation of objects. A decoder is used to decode this permutation into an instantiafion of the COP vm-iables. Two examples of such constrmnsd ordering prob...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 33  issue 7

pages  -

publication date 2020-07-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023