Extended Similarity Trees
نویسنده
چکیده
Trees are commonly used to represent proximity relations that emerge, for instance, from studies of classification, similarity, and identification. Trees are employed to describe the data, explore their structure and model their generating process. They offer a convenient graphical display that is readily interpretable in terms of a hierarchy of clusters (Sokal &Sneath, 1963) or in terms of common and distinctive features (Tversky, 1977). The simplest tree structure is the hierarchical clustering model (Jardine & Sibson, 1971; Johnson, 1967) based on the ultrametric inequality, which states that for any triple of points the two larger distances are equal. That is, any three points can be labeled x, y, z such that d(x, z) = d(y, z) >_ d(x, y). This assumption gives rise to a tree in which all the endpoints (leaves) are equally distant from the root. The ultrametric tree is highly restrictive because any two elements of one cluster must be equally similar to any other element outside the cluster. This restriction is relaxed in the additive tree (e.g., Cunningham, 1978; Sattath & Tversky, 1977), where the leaves are not necessarily equidistant from the root. The additive tree provides greater flexibility than the ultrametric tree, but it too cannot accomodate (nonnested) overlapping clusters because any two clusters in a tree are either nested or disjoint. Throughout the paper we use the standard abbreviations (e.g., HICLUS, ADDTREE, ADCLUS) for scaling algorithms, and the unabbreviated forms (e.g., hierarchical clustering, additive tree, additive clustering) for the respective models. This article describes a new representation of proximity relations, called an extended tree, which accommodates nonnested feature structures while maintaining the basic property of a tree that every pair of points is joined by a unique path. To motivate and
منابع مشابه
On pseudosimilarity in trees
Two vertices u and v in a graph G are said to be removal-similar if G\u z G\v. Vertices which are removal-similar but not similar are said to be pseudosimilar. A characterization theorem is presented for trees (later extended to forests and block graphs) with pseudosimilar vertices. It follows from this characterization that it is not possible to have three or more mutually pseudosimilar vertic...
متن کاملComputer simulation of tree development with random variations and probabilistic growth of branches
This paper presents the use of self similarity with random variations and probabilistic growth of branches as the basis of generating natural forms of trees. Bifurcation branching geometry has been considered as the fundamental element in the formation of tree structures and Honda’s model was used to generate the recursive branching patterns. In order to simulate the dynamic variations of natur...
متن کاملP´olya Urn Models and Connections to Random Trees: A Review
This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...
متن کاملLog-Poisson Statistics and Extended Self-Similarity in Driven Dissipative Systems
The Bak-Chen-Tang forest fire model (1) was proposed as a toy model of turbulent systems, where energy (in the form of trees) is injected uniformly and globally, but is dissipated (burns) locally. We review our previous results on the model (2; 3) and present our new results on the statistics of the higher-order moments for the spatial distribution of fires. We show numerically that the spatial...
متن کاملA general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees
Decision trees are among the most popular pattern types in data mining due to their intuitive representation. However, little attention has been given on the definition of measures of semantic similarity between decision trees. In this work, we present a general framework for similarity estimation that includes as special cases the estimation of semantic similarity between decision trees, as we...
متن کامل