Bayesian Clustering with Outliers and Missing Values
نویسنده
چکیده
The Bayesian Robust Mixture Model (BRMM) is a fully probabilistic model for grouping realvalued data into a finite number of clusters. The model is robust in the sense that it tolerates outliers in the data and handles missing values, both within the Bayesian inference framework. Foreword The purpose of this report is to provide a detailed, step-by-step derivation of the variational update equations for the Bayesian Robust Mixture Model (BRMM). Essential background material on probability distributions an variational approximations is provided in section 1. 1 Background 1.1 Probability distributions This sub-section summarizes definitions and basic properties of the probability distributions we will need for the model. Please note that most of these distributions are parameterized differently than in the literature. My departure from conventional parameterizations will allow for a more intuitive interpretation of the model’s hyper-parameters. For convenience, I have summarized them in table 1. The categorical family A categorical variable is a random variable that takes values in a finite set. Let π = (π1, . . . , πm) be a vector on the standard m− 1 simplex, ∆m−1 = { x = (x1, . . . , xm) ∈ R : 0 ≤ xi ≤ 1 ∀ i, m ∑
منابع مشابه
Probabilistic Low-Rank Subspace Clustering
In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we first develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian ...
متن کاملA BAYESIAN APPROACH TO COMPUTING MISSING REGRESSOR VALUES
In this article, Lindley's measure of average information is used to measure the information contained in incomplete observations on the vector of unknown regression coefficients [9]. This measure of information may be used to compute the missing regressor values.
متن کاملApplication of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering,...
متن کاملPerformance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model
BACKGROUND It is challenging to deal with mixture models when missing values occur in clustering datasets. METHODS AND RESULTS We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a "pseudo-complete" dataset. Parameters from different clusters and missing values are estimated according to the maximum likel...
متن کاملStatistical data preparation: management of missing values and outliers
Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estima...
متن کامل