Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method
نویسندگان
چکیده
Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hybrid method that runs PLSI and NMF alternatively, each jumping out of local minima of the other method successively, thus achieving better final solution. Extensive experiments on 5 real-life datasets show relations between NMF and PLSI, and indicate the hybrid method lead to significant improvements over NMF-only or PLSI-only methods. We also show that at first order approximation, NMF is identical to χ-statistic. Introduction Document clustering has been widely used as a fundamental and effective tool for efficient document organization, summarization, navigation and retrieval of large amount of documents. Generally document clustering problems are determined by the three basic tightly-coupled components: a) the (physical) representation of the given data set; b) The criterion/objective functionwhich the clustering solutions should aim to optimize; c) The optimization procedure (Li 2005). Among clustering methods, the K-means algorithm has been the most popularly used. A recent development is the Probabilistic Latent Semantic Indexing (PLSI). PLSI is a unsupervised learning method based on statistical latent class models and has been successfully applied to document clustering (Hofmann 1999). (PLSI is further developed into a more comprehensive Latent Dirichlet Allocation model (Blei, Ng, & Jordan 2003).) Nonnegative Matrix Factorization (NMF) is another recent development for document clustering. Initial work on NMF (Lee & Seung 1999; 2001) emphasizes the contain coherent parts of the original data (images). Later work (Xu, Liu, & Gong 2003; Pauca et al. 2004) show the usefulness of NMF for clustering with in experiments on documents Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. collections, and a recent theoretical analysis (Ding, He, & Simon 2005) shows the equivalence between NMF and Kmeans / spectral clustering. Despite significant research on both NMF and PLSI, few attempts have been made to establish the connections between them while highlighting their differences in the clustering framework. Gaussier and Goutte (Gaussier & Goutte 2005) made the first connection between NMF and PLSI, by showing that the local fixed point solutions of the iterative procedures of PLSI and NMF are the same. Their proof is, however, incorrect. NMF and PLSI are different algorithms. They converge to different solutions while starting from the same initial condition, as verified by experiments (see later sections). In this paper, we first show that both NMF and PLSI optimize the same objective function. This fundamental fact and the L1 normalization NMF ensures that NMF and PLSI are equivalent. Second, we show, by an example and extensive experiments, that NMF and PLSI are different algorithms and they converge to different local minima. This leads to a new insight: NMF and PLSI are different algorithms for optimizing the same objective function. Third, we give a detailed analysis about the NMF and PLSI solutions. They are local minima of the same landscape in a very high dimensional space. We show that PLSI can jump out of the local minima where NMF converges to and vice versa. Based on this, we further propose a hybrid algorithm to run NMF and PLSI alternatively to jump out a series of local minima and finally reach to a much better minimum. Extensive experiments show this hybrid algorithm improves significantly over the standard NMF-only or PLSI-only algorithms. Data Representations of NMF and PLSI Suppose we have n documents and m words (terms). Let F = (Fij) be the word-to-document matrix: Fij = F (wi, dj) is the frequency of word wi in document dj . In this paper, we re-scale the term frequencyFij byFij ←
منابع مشابه
On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing
Non-negative Matrix Factorization (NMF) and Probabilistic Latent Semantic Indexing (PLSI) have been successfully applied to document clustering recently. In this paper, we show that PLSI and NMF (with the I-divergence objective function) optimize the same objective function, although PLSI and NMF are different algorithms as verified by experiments. This provides a theoretical basis for a new hy...
متن کاملTopic Modeling via Nonnegative Matrix Factorization on Probability Simplex
One important goal of document modeling is to extract a set of informative topics from a text corpus and produce a reduced representation of each document. In this paper, we propose a novel algorithm for this task based on nonnegative matrix factorization on a probability simplex. We further extend our algorithm by removing global and generic information to produce more diverse and specific top...
متن کاملNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
NMF and PLSI are two state-of-the-art unsupervised learning models in data mining, and both are widely used in many applications. References have shown the equivalence between NMF and PLSI under some conditions. However, a new issue arises here: why can they result in different solutions since they are equivalent? or in other words, their algorithm differences are not studied intensively yet. I...
متن کاملClustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization
This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our appr...
متن کاملA Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem. At each step of ALS algorithms two convex least square problems should be solved, which causes high com...
متن کامل