Greedy clustering of count data through a mixture of multinomial PCA
نویسندگان
چکیده
منابع مشابه
Clustering Images with Multinomial Mixture Models
In this paper, we propose a method for image clustering using multinomial mixture models. The mixture of multinomial distributions, often called multinomial mixture, is a probabilistic model mainly used for text mining. The effectiveness of multinomial distribution for text mining originates from the fact that words can be regarded as independently generated in the first approximation. In this ...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
The power and pitfalls of Dirichlet-multinomial mixture models for ecological count data
The Dirichlet-multinomial mixture model (DMM) and its extensions provide powerful new tools for interpreting the ecological dynamics underlying taxon abundance data. However, like many complex models, how effectively they capture the many features of empirical data is not well understood. In this work, we expand the DMM to an infinite mixture model (iDMM) and use posterior predictive distributi...
متن کاملIs Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction?
Discrete analogues to Principal Components Analysis (PCA) are intended to handle discrete or positive-only data, for instance sets of documents. The class of methods is appropriately called multinomial PCA because it replaces the Gaussian in the probabilistic formulation of PCA with a multinomial. Experiments to date, however, have been on small data sets, for instance, from early information r...
متن کاملModel-based clustering of high-dimensional data streams with online mixture of probabilistic PCA
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilisti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Statistics
سال: 2020
ISSN: 0943-4062,1613-9658
DOI: 10.1007/s00180-020-01008-9