Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification

نویسندگان

  • Sajib Dasgupta
  • Vincent Ng
چکیده

While traditional work on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the author’s mood, gender, age, or sentiment. Without knowing the user’s intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address this problem, we propose a novel way of incorporating user feedback into a clustering algorithm, which allows a user to easily specify the dimension along which she wants the data points to be clustered via inspecting only a small number of words. This distinguishes our method from existing ones, which typically require a large amount of effort on the part of humans in the form of document annotation or interactive construction of the feature space. We demonstrate the viability of our method on several challenging sentiment datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Identifying Features from Opinion Mining Using Fine-Grained Relational Topic Weighted Approach

-Opinion feature extraction is a sub problem of opinion mining analyzed at document, sentence, or even phrase (word) levels. Document-level (sentence-level) opinion mining is classified as overall subjectivity or sentiment, expressed in an individual review document. The existing approaches to opinion feature extraction depended on mining patterns from a particular evaluate corpus disregard non...

متن کامل

An Unsupervised Sentiment Classifier on Summarized or Full Reviews

These days web users searching for opinions expressed by others on a particular product or service PS can turn to review repositories, such as Epinions.com or Imdb.com. While these repositories often provide a high quantity of reviews on PS, browsing through archived reviews to locate different opinions expressed on PS is a time-consuming and tedious task, and in most cases, a very labor-intens...

متن کامل

A Comparative Study of Bayesian Models for Unsupervised Sentiment Detection

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentimenttopic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009