Extracting Topic Words and Clustering Documents by Probabilistic Graphical Models
نویسندگان
چکیده
ABSTRACT We present a method for lustering do uments and extra ting topi words of ea h luster using a probabilisti graphi al model. We maximize the likelihood of the model with the Expe tation Maximization algorithm. Our experiments demonstrate that the latent variables of the model an be seen as lusters of do uments and terms.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملTopic Extraction from Text Documents Using Multiple-Cause Networks
This paper presents an approach to the topic extraction from text documents using probabilistic graphical models. Multiple-cause networks with latent variables are used and the Helmholtz machines are utilized to ease the learning and inference. The learning in this model is conducted in a purely data-driven way and does not require prespecified categories of the given documents. Topic words ext...
متن کاملModeling General and Specific Aspects of Documents with a Probabilistic Topic Model
Techniques such as probabilistic topic models and latent-semantic indexing have been shown to be broadly useful at automatically extracting the topical or semantic content of documents, or more generally for dimension-reduction of sparse count data. These types of models and algorithms can be viewed as generating an abstraction from the words in a document to a lower-dimensional latent variable...
متن کاملLatent Dirichlet Markov Allocation for Sentiment Analysis
In recent years probabilistic topic models have gained tremendous attention in data mining and natural language processing research areas. In the field of information retrieval for text mining, a variety of probabilistic topic models have been used to analyse content of documents. A topic model is a generative model for documents, it specifies a probabilistic procedure by which documents can be...
متن کاملIeee Transactions on Geoscience and Remote Sensing
Abstract—A novel model is presented to address the problem of semantic clustering of geo-objects in VHR panchromatic satellite images. The proposed model combines a probabilistic topic model with a multi-scale image representation into an automatic framework by embedding both document and scale selections. The probabilistic topic model is used to characterize the statistical distributions of ...
متن کامل