Beyond Topicality: Finding Opinionated Chinese Documents

نویسندگان

  • Yejun Wu
  • Douglas W. Oard
چکیده

The availability of Web 2.0 technologies has made it easy for information users to express their own opinions and access other people’s opinions on the Web. We are interested in understanding how opinions expressed in one way by one group compare to opinions expressed in another way by another group, especially in a different language. We have done reasonably well at finding opinionated English mailing lists and blogs, so we started to work on Chinese opinion classification. This paper reports on experiments with a recently released opinion classification test collection for Chinese sentences. Term-scale evidence from a large lexicon and from character-based estimation of semantic orientation for unknown words was used to construct classifiers for subjectivity and polarity that are somewhat more accurate than the best previously reported results. Subjectivity density and the relative predominance of terms with positive and negative semantic orientation were found to be useful features, and appropriate handling of negation was found to be important. With bilingual opinion classification techniques, we can help users find and compare opinions about a topic in two languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence-Level Opinion Analysis for Chinese News Documents Based on Sen- timent Information of Social Tags

Social tags have been considered to indirectly reflect authorized opinions of taggers. This paper proposes an unsupervised method which derives implicit sentiment information from social tags to decide, in one document, which sentences are opinionated, as well as to annotate them with proper polarity labels. First, for a social tag, its opinion degree is measured by aggregating the opinion degr...

متن کامل

UniNE at TREC 2008: Fact and Opinion Retrieval in the Blogsphere

This paper describes our participation in the Blog track at the TREC 2008 evaluation campaign. The Blog track goes beyond simple document retrieval, its main goal is to identify opinionated blog posts and assign a polarity measure (positive, negative or mixed) to these information items. Available topics cover various target entities, such as people, location or product for example. This year’s...

متن کامل

A MEMs-based Labeling Approach to Punctuation Correction in Chinese Opinionated Text

This paper presents a maximum entropy models based approach to punctuation prediction and correction for Chinese opinionated texts. This study involves three parts. First, we conduct a survey of punctuation errors in Chinese opinionated texts based on a corpus of online product reviews. Then, we propose a maximum entropy sequence labeling approach to Chinese punctuation prediction. Finally, we ...

متن کامل

A Hybrid Method for Opinion finding Task (KUNLP at TREC 2008 Blog Track)

This paper presents an approach for the Opinion Finding task at TREC 2008 Blog Track. For the Ad-hoc Retrieval subtask, we adopt language model to retrieve relevant documents. For the Opinion Retrieval subtask, we propose a hybrid model of lexicon-based approach and machine learning approach for estimating and ranking the opinionated documents. For the Polarized Opinion Retrieval subtask, we em...

متن کامل

University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier

In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier [14], our modular and scalable Information Retrieval (IR) platform, and the Divergence From Randomness (DFR) framework. In particular, for the Blog track opinion finding task, we propose a statistical term weighting approach to identify opinionated documents. An alternative approa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011