Reducing Over-Weighting in Supervised Term Weighting for Sentiment Analysis

نویسندگان

  • Haibing Wu
  • Xiaodong Gu
چکیده

Recently the research on supervised term weighting has attracted growing attention in the field of Traditional Text Categorization (TTC) and Sentiment Analysis (SA). Despite their impressive achievements, we show that existing methods more or less suffer from the problem of over-weighting. Overlooked by prior studies, over-weighting is a new concept proposed in this paper. To address this problem, two regularization techniques, singular term cutting and bias term, are integrated into our framework of supervised term weighting schemes. Using the concepts of over-weighting and regularization, we provide new insights into existing methods and present their regularized versions. Moreover, under the guidance of our framework, we develop a novel supervised term weighting scheme, regularized entropy (re). The proposed framework is evaluated on three datasets widely used in SA. The experimental results indicate that our re enjoys the best results in comparisons with existing methods, and regularization techniques can significantly improve the performances of existing supervised weighting methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme for Sentiment Analysis and Text Classification

We provide a simple but novel supervised weighting scheme for adjusting term frequency in tf-idf for sentiment analysis and text classification. We compare our method to baseline weighting schemes and find that it outperforms them on multiple benchmarks. The method is robust and works well on both snippets and longer documents.

متن کامل

Balancing between over-weighting and under-weighting in supervised term weighting

Supervised term weighting could improve the performance of text categorization. A way proven to be effective is to give more weight to terms with more imbalanced distributions across categories. This paper shows that supervised term weighting should not just assign large weights to imbalanced terms, but should also control the trade-off between over-weighting and under-weighting. Overweighting,...

متن کامل

Supervised Term Weighting Metrics for Sentiment Analysis in Short Text

Term weighting metrics assign weights to terms in order to discriminate the important terms from the less crucial ones. Due to this characteristic, these metrics have attracted growing attention in text classification and recently in sentiment analysis. Using the weights given by such metrics could lead to more accurate document representation which may improve the performance of the classifica...

متن کامل

Application of a clustering method on sentiment analysis

This article introduces a novel approach for sentiment analysis – the clustering-based sentiment analysis approach. By applying a TFIDF weighting method, a voting mechanism and importing term scores, an acceptable and stable clustering result can be obtained. The methodology has competitive advantages over the two existing types of approaches: symbolic techniques and supervised learning methods...

متن کامل

Overlap-based feature weighting: The feature extraction of Hyperspectral remote sensing imagery

Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weigh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014