Extracting Topic Words and Clustering Documents by Probabilistic Graphical Models

نویسندگان

  • Hyung-Joo Shin
  • Byoung-Tak Zhang
چکیده

ABSTRACT We present a method for lustering do uments and extra ting topi words of ea h luster using a probabilisti graphi al model. We maximize the likelihood of the model with the Expe tation Maximization algorithm. Our experiments demonstrate that the latent variables of the model an be seen as lusters of do uments and terms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

Topic Extraction from Text Documents Using Multiple-Cause Networks

This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words ext...

متن کامل

Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model

Techniques such as probabilistic topic models and latent-semantic indexing have been shown to be broadly useful at automatically extracting the topical or semantic content of documents, or more generally for dimension-reduction of sparse count data. These types of models and algorithms can be viewed as generating an abstraction from the words in a document to a lower-dimensional latent variable...

متن کامل

Latent Dirichlet Markov Allocation for Sentiment Analysis

In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be...

متن کامل

Ieee Transactions on Geoscience and Remote Sensing

 Abstract—A novel model is presented to address the problem of semantic clustering of geo-objects in VHR panchromatic satellite images. The proposed model combines a probabilistic topic model with a multi-scale image representation into an automatic framework by embedding both document and scale selections. The probabilistic topic model is used to characterize the statistical distributions of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007