A New Implementation of the co-VAT Algorithm for Visual Assessment of Clusters in Rectangular Relational Data
نویسندگان
چکیده
This paper presents a new implementation of the co-VAT algorithm. We assume we have an m× n matrix D, where the elements of D are pair-wise dissimilarities betweenm row objectsOr and n column objectsOc. The union of these disjoint sets are (N = m + n) objects O. Clustering tendency assessment is the process by which a data set is analyzed to determine the number(s) of clusters present. In 2007, the co-Visual Assessment of Tendency (co-VAT) algorithm was proposed for rectangular data such as these. co-VAT is a visual approach that addresses four clustering tendency questions: i) How many clusters are in the row objects Or? ii) How many clusters are in the column objects Oc? iii) How many clusters are in the union of the row and column objects Or ∪ Oc? And, iv) How many (co)-clusters are there that contain at least one of each type? co-VAT first imputes pair-wise dissimilarity values among the row objects, the square relational matrix Dr , and the column objects, the square relational matrix Dc, and then builds a larger square dissimilarity matrix Dr∪c. The clustering questions can then be addressed by using the VAT algorithm on Dr , Dc, and Dr∪c; D is reordered by shuffling the reordering indices ofDr∪c. Subsequently, the co-VAT image of D may show tendency for co-clusters (problem iv). We first discuss a different way to construct this image, and then we also extend a path-based distance transform, which is used in the iVAT algorithm, to co-VAT. The new algorithm, co-iVAT, shows dramatic improvement in the ability of co-VAT to show cluster tendency in rectangular dissimilarity data.
منابع مشابه
Clustering in Relational Data and Ontologies
This dissertation studies the problem of clustering objects represented by relational data. This is a pertinent problem as many real-world data sets can only be represented by relational data for which object-based clustering algorithms are not designed. Relational data are encountered in many fields including biology, management, industrial engineering, and social sciences. Unlike numerical ob...
متن کاملThree Visual Cluster Validity Methods for Object and Relational Data
This talk is about three visual cluster validity methods developed by the authors that can be used for both object and relational data sets. The original method VAT (visual assessment of clustering tendency) works nicely for dissimilarity data up to about n = 5000 objects, but VAT quickly bumps up against storage and resolution limits, and is of limited utility for large data sets. The second m...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملData Clustring Using A New CGA(Chaotic-Generic Algorithm) Approach
Clustering is the process of dividing a set of input data into a number of subgroups. The members of each subgroup are similar to each other but different from members of other subgroups. The genetic algorithm has enjoyed many applications in clustering data. One of these applications is the clustering of images. The problem with the earlier methods used in clustering images was in selecting in...
متن کاملAn Algorithm for Clustering Tendency Assessment
The visual assessment of tendency (VAT) technique, developed by J.C. Bezdek, R.J. Hathaway and J.M. Huband, uses a visual approach to find the number of clusters in data. In this paper, we develop a new algorithm that processes the numeric output of VAT programs, other than gray level images as in VAT, and produces the tendency curves. Possible cluster borders will be seen as high-low patterns ...
متن کامل