Bayesian Rose Trees
نویسندگان
چکیده
Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bayesian hierarchical clustering algorithm that can produce trees with arbitrary branching structure at each node, known as rose trees. We interpret these trees as mixtures over partitions of a data set, and use a computationally efficient, greedy agglomerative algorithm to find the rose trees which have high marginal likelihood given the data. Lastly, we perform experiments which demonstrate that rose trees are better models of data than the typical binary trees returned by other hierarchical clustering algorithms.
منابع مشابه
Discovering Non-binary Hierarchical Structures with Bayesian Rose Trees
Rich hierarchical structures are common across many disciplines, making the discovery of hierarchies a fundamental exploratory data analysis and unsupervised learning problem. Applications with natural hierarchical structure include topic hierarchies in text (Blei et al. 2010), phylogenies in evolutionary biology (Felsenstein 2003), hierarchical community structures in social networks (Girvan a...
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS Design and Implementation of General Tree Skeletons
Trees are important datatypes that are often used in representing structured data such as XML. Though trees are widely used in sequential programming, it is hard to write efficient parallel programs manipulating trees of arbitrary shapes, because of their irregular and ill-balanced structures. In this paper, we propose a solution for them based on the skeletal approach, in particular for genera...
متن کاملExtracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection
In this paper, we present a comparison of five different approaches to extracting decision trees from diagnostic Bayesian nets, including an approach based on the dependency structure of the network itself. With this approach, attributes used in branching the decision tree are selected by a weighted information gain metric computed based upon an associated D-matrix. Using these trees, tests are...
متن کاملParallel skeletons for manipulating general trees
Trees are important datatypes that are often used in representing structured data such as XML. Though trees are widely used in sequential programming, it is hard to write efficient parallel programs manipulating trees, because of their irregular and ill-balanced structures. In this paper, we propose a solution based on the skeletal approach. We formalize a set of skeletons (abstracted computati...
متن کامل