Creating a General Russian Sentiment Lexicon
نویسندگان
چکیده
The paper describes the new Russian sentiment lexicon RuSentiLex. The lexicon was gathered from several sources: opinionated words from domain-oriented Russian sentiment vocabularies, slang and curse words extracted from Twitter, objective words with positive or negative connotations from a news collection. The words in the lexicon having different sentiment orientations in specific senses are linked to appropriate concepts of the thesaurus of Russian language RuThes. All lexicon entries are classified according to four sentiment categories and three sources of sentiment (opinion, emotion, or fact). The lexicon can serve as the first version for the construction of domain-specific sentiment lexicons or be used for feature generation in machine-learning approaches. In this role, the RuSentiLex lexicon was utilized by the participants of the SentiRuEval-2016 Twitter reputation monitoring shared task and allowed them to achieve high results.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملExtraction of Russian Sentiment Lexicon for Product Meta-Domain
In this paper we consider a new approach for domain-specific sentiment lexicon extraction in Russian. We propose a set of statistical features and algorithm combination that can discriminate sentiment words in a specific domain. The extraction model is trained in the movie domain and then utilized to other domains. We evaluate the quality of obtained sentiment vocabularies intrinsically. Finall...
متن کاملУточнение русскоязычных словарей эмоциональной лексики с использованием тезауруса RuThes (Refinement of Russian Sentiment Lexicons Using RuThes Thesaurus)
The paper describes a combined approach to extraction of a domain-specific sentiment lexicon. At first, an initial version of a domainspecific lexicon is obtained by application of a supervised model. At the second stage, the ordered list of sentiment words is refined using the thesaurus information. This combined model is applied to several domains and at last the domain-specific sentiment lex...
متن کاملTwo-Step Model for Sentiment Lexicon Extraction from Twitter Streams
In this study we explore a novel technique for creation of polarity lexicons from the Twitter streams in Russian and English. With this aim we make preliminary filtering of subjective tweets using general domain-independent lexicons in each language. Then the subjective tweets are used for extraction of domain-specific sentiment words. Relying on co-occurrence statistics of extracted words in a...
متن کاملDomEx: Extraction of Sentiment Lexicons for Domains and Meta-Domains
In this paper we describe a DomEx sentiment lexicon extractor, where a new approach for domain-specific sentiment lexicon extraction is implemented. Sentiment lexicon extraction is based on the machine learning model comprising a set of statistical and linguistic features. The extraction model is trained in the movie domain and then can be utilized to other domains. The system can work with var...
متن کامل