Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode
نویسندگان
چکیده
The original k-means algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being applied to categorical data clustering, which is an integral part of data mining and has attracted much attention recently. The k-modes algorithm extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus the k-means fashion of minimizing a numerically valued cost. However, the dissimilarity measure used in k-modes doesn’t consider the relative frequencies of attribute values in each cluster mode, this will result in a weaker intra-cluster similarity by allocating less similar objects to the cluster. In this paper, we present an experimental study on applying a new dissimilarity measure to the k-modes clustering to improve its clustering accuracy. The measure is based on the idea that the similarity between a data object and cluster mode, is directly proportional to the sum of relative frequencies of the common values in mode. Experimental results on real life datasets show that, the modified algorithm is superior to the original kmodes algorithm with respect to clustering accuracy.
منابع مشابه
Numerical Calculation of Resonant Frequencies and Modes of a Three-Atom Photonic Molecule and a Photonic Crystal in an External Cavity
In the present paper, resonant frequencies and modes of a three-atom photonic molecule and a photonic crystal placed within a cavity are numerically calculated. First, governing formulation in transverse electric field mode (TE) is obtained using Maxwell equations. Then, an algorithm based on a finite difference scheme and matrix algebra is presented. The algorithm is then implemented in a comp...
متن کاملA Multi-Mode Resource-Constrained Optimization of Time-Cost Trade-off Problems in Project Scheduling Using a Genetic Algorithm
In this paper, we present a genetic algorithm (GA) for optimization of a multi-mode resource constrained time cost trade off (MRCTCT) problem. The proposed GA, each activity has several operational modes and each mode identifies a possible executive time and cost of the activity. Beyond earlier studies on time-cost trade-off problem, in MRCTCT problem, resource requirements of each execution mo...
متن کاملSOLVING BEST PATH PROBLEM ON MULTIMODAL TRANSPORTATION NETWORKS WITH FUZZY COSTS
Numerous algorithms have been proposed to solve the shortest-pathproblem; many of them consider a single-mode network and crispcosts. Other attempts have addressed the problem of fuzzy costs ina single-mode network, the so-called fuzzy shortest-path problem(FSPP). The main contribution of the present work is to solve theoptimum path problem in a multimodal transportation network, inwhich the co...
متن کاملImproved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency...
متن کاملAnalyzing the internal resonances and energy exchange between modes of power system considering Frequency – Energy dependence using Pseudo-Arclength and shooting algorithm
The power system nonlinearity and its profound impact on the individual states of power system is first evaluated and the interaction between their constituent modes during the occurrence of internal mode resonance (IMR) is discussed in this paper. A typical dynamical feature of nonlinear systems is the frequency-energy dependence of their states and their corresponding constituent modes which ...
متن کامل