Convex Clustering with Exemplar-Based Models

نویسندگان

  • Danial Lashkari
  • Polina Golland
چکیده

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convex Clustering with Exemplar-Based Models

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a signifi...

متن کامل

Convex Clustering with Exemplar-Based Models

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a signifi...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models

MAD-Bayes (MAP-based Asymptotic Derivations) has been recently proposed as a general technique to derive scalable algorithm for Bayesian Nonparametric models. However, the combinatorial nature of objective functions derived from MAD-Bayes results in hard optimization problem, for which current practice employs heuristic algorithms analogous to k-means to find local minimum. In this paper, we co...

متن کامل

An Evaluation of a Commercial CC-NUMA Architecture - The CONVEX Exemplar SPP1200

Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar, a commercial distributed shared memory machine, lives up to the expected potential of CCNUMA machines. If not, we would like to understand what architectural or implementation decisions make it less efficient. On eva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Advances in neural information processing systems

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2007