Topic Modeling via Nonnegative Matrix Factorization on Probability Simplex

ثبت نشده
چکیده

One important goal of document modeling is to extract a set of informative topics from a text corpus and produce a reduced representation of each document. In this paper, we propose a novel algorithm for this task based on nonnegative matrix factorization on a probability simplex. We further extend our algorithm by removing global and generic information to produce more diverse and specific topics. In contrast to other matrix factorization methods, such as latent semantic indexing by singular value decomposition, our model has a solid statistical foundation and is based on a generative model for text corpus. In contrast to purely probabilistic approach, such as probabilistic latent semantic indexing (pLSI) solved by Expectation-Maximization, our method is based on efficient block coordinate decent optimization. Experiments demonstrate that the new method generates more meaningful and diverse topics compared with pLSI and LDA with faster convergence behavior.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem.  At each step of ALS algorithms two convex least square problems should be solved, which causes high com...

متن کامل

High-Recall Document Retrieval from Large-Scale Noisy Documents via Visual Analytics based on Targeted Topic Modeling

We present a visual analytics system for large-scale document retrieval tasks with high recall where any missing relevant documents can be critical. Our system utilizes a novel user-driven topic modeling called targeted topic modeling, a variant of nonnegative matrix factorization (NMF). Our system visualizes a topic summary in a treemap form and lets users keep relevant topics and incrementall...

متن کامل

Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) has been increasingly popular for topic modeling of largescale documents. However, the resulting topics often represent only general, thus redundant information about the data rather than minor, but potentially meaningful information to users. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering high...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013