Taxonomic Prediction with Tree-Structured Covariances

نویسندگان

  • Matthew B. Blaschko
  • Wojciech Zaremba
  • Arthur Gretton
چکیده

Taxonomies have been proposed numerous times in the literature in order to encode semantic relationships between classes. Such taxonomies have been used to improve classification results by increasing the statistical efficiency of learning, as similarities between classes can be used to increase the amount of relevant data during training. In this paper, we show how data-derived taxonomies may be used in a structured prediction framework, and compare the performance of learned and semantically constructed taxonomies. Structured prediction in this case is multi-class categorization with the assumption that categories are taxonomically related. We make three main contributions: (i) We prove the equivalence between tree-structured covariance matrices and taxonomies; (ii) We use this covariance representation to develop a highly computationally efficient optimization algorithm for structured prediction with taxonomies; (iii) We show that the taxonomies learned from data using the Hilbert-Schmidt Independence Criterion (HSIC) often perform better than imputed semantic taxonomies. Source code of this implementation, as well as machine readable learned taxonomies are available for download from https://github.com/blaschko/tree-structured-covariance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining Difference in Evolutionary Variation of Bacterial RecA proteins vs 16SrRNA Genes by using 16s_Toxonomy Tree

Background and Aims: The rate of variation in various genes of a bacterial species is different during evolution. Therefore, in systematic bacterial studies many researchers compare the phylogenetic tree of a particular gene to the standard tree of an rRNA gene. Regarding the importance of 16SrRNA gene and the evolutional process of RecA protein family, we investigated the changes in the select...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Structured Output Prediction with Hierarchical Loss Functions for Seafloor Imagery Taxonomic Categorization

In this paper we study the challenging problem of seafloor imagery taxonomic categorization. Our contribution is threefold. First, we demonstrate that this task can be elegantly translated into a Structured SVM learning framework. Second, we introduce a taxonomic loss function in the structured output classification objective during learning that is shown to improve the performance over other l...

متن کامل

A Practical Perspective on Latent Structured Prediction for Coreference Resolution

Latent structured prediction theory proposes powerful methods such as Latent Structural SVM (LSSVM), which can potentially be very appealing for coreference resolution (CR). In contrast, only small work is available, mainly targeting the latent structured perceptron (LSP). In this paper, we carried out a practical study comparing for the first time online learning with LSSVM. We analyze the int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013