Towards a Principled Theory of Clustering
نویسنده
چکیده
To answer the question “Which clustering function should one use?” for a given task, we consider an axiomatic approach to the theory of Clustering, with special focus on uniqueness theorems characterizing popular clustering functions. We argue that such theorems can be used to decide exactly when a particular clustering function should be used or avoided. We discuss abstract properties of clustering functions, following the framework of Kleinberg, [Kleinberg, 2003]. By altering one of Kleinberg’s axioms, we sidestep his impossibility result and arrive at a consistent set of axioms, aiming to provide an axiomatic taxonomy of clustering paradigms. The main result of this paper is a set of abstract properties that characterize the Max-Sum and Single-Linkage clustering functions. These functions have been traditionally treated separately, as the principles motivating their use have never been unified. These uniqueness theorems will guide the user to decide which is appropriate for a particular task, if either of them. Our results also provide a theoretical foundation for empirical observations of clustering performed by humans in [Dry et al., 2009].
منابع مشابه
Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information
Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...
متن کاملInformation Theoretic Clustering using Kernel Density Estimation
In recent years, information-theoretic clustering algorithms have been proposed which assign data points to clusters so as to maximize the mutual information between cluster labels and data [1, 2]. Using mutual information for clustering has several attractive properties: it is flexible enough to fit complex patterns in the data, and allows for a principled approach to clustering without assumi...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملUsing Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands
Using Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands Abstract In this paper, the capacitated location routing problem with fuzzy demands (CLRP_FD) is considered. In CLRP_FD, facility location problem (FLP) and vehicle routing problem (VRP) are observed simultaneously. Indeed the vehicles and the depots have a predefined capacity to serve the customerst...
متن کامل