Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents
نویسندگان
چکیده
Recognizing polarity requires a list of polar words and phrases. For the purpose of building such lexicon automatically, a lot of studies have investigated (semi-) unsupervised method of learning polarity of words and phrases. In this paper, we explore to use structural clues that can extract polar sentences from Japanese HTML documents, and build lexicon from the extracted polar sentences. The key idea is to develop the structural clues so that it achieves extremely high precision at the cost of recall. In order to compensate for the low recall, we used massive collection of HTML documents. Thus, we could prepare enough polar sentence corpus.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملSentiment extraction from financial public disclosure documents
We address the problem of extracting sentiment in financial public disclosure documents, and explore their effects on daily price movements. We take a collection of public disclosure forms submitted by four companies in the Turkish stock market. Using simple classification algorithms, we point to a significant correlation between the content of disclosure texts and the next day’s price directio...
متن کاملBuilding Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach
In this paper, we propose to build large-scale sentiment lexicon from Twitter with a representation learning approach. We cast sentiment lexicon learning as a phrase-level sentiment classification task. The challenges are developing effective feature representation of phrases and obtaining training data with minor manual annotations for building the sentiment classifier. Specifically, we develo...
متن کاملSentiment Analysis Using a Novel Human Computation Game
In this paper, we propose a novel human computation game for sentiment analysis. Our game aims at annotating sentiments of a collection of text documents and simultaneously constructing a highly discriminative lexicon of positive and negative phrases. Human computation games have been widely used in recent years to acquire human knowledge and use it to solve problems which are infeasible to sol...
متن کاملBootstrapping Sentiment Labels For Unannotated Documents With Polarity PageRank
We present a novel graph-theoretic method for the initial annotation of high-confidence training data for bootstrapping sentiment classifiers. We estimate polarity using topic-specific PageRank. Sentiment information is propagated from an initial seed lexicon through a joint graph representation of words and documents. We report improved classification accuracies across multiple domains for the...
متن کامل