Supervised sentiment analysis in multilingual environments

نویسندگان

  • David Vilares
  • Miguel A. Alonso
  • Carlos Gómez-Rodríguez
چکیده

This article tackles the problem of performing multilingual polarity classification on Twitter, comparing three techniques: (1) a multilingual model trained on a multilingual dataset, obtained by fusing existing monolingual resources, that does not need any language recognition step, (2) a dual monolingual model with perfect language detection on monolingual texts and (3) a monolingual model that acts based on the decision provided by a language identification tool. The techniques were evaluated on monolingual, synthetic multilingual and code-switching corpora of English and Spanish tweets. In the latter case we introduce the first code-switching Twitter corpus with sentiment labels. The samples are labelled according to two well-known criteria used for this purpose: the SentiStrength scale and a trinary scale (positive, neutral and negative categories). The experimental results show the robustness of the multilingual approach (1) and also that it outperforms the monolingual models on some monolingual datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Supervised Method for Constructing Sentiment Lexicon in Persian Language

Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...

متن کامل

Comparative Experiments for Multilingual Sentiment Analysis Using Machine Translation

Sentiment analysis is the Natural Language Processing (NLP) task dealing with sentiment detection and classification from text. Given the importance of user-generated contents on the recent Social Web, this task has received much attention from the NLP research community in the past years. Sentiment analysis has been studied in different types of texts and in the context of distinct domains. Ho...

متن کامل

Multilingual Opinion Holder and Target Extraction using Knowledge-Poor Techniques

We describe an approach to multilingual sentiment analysis, in particular opinion holder and opinion target extraction, which requires no annotated data and minimal language-specific input. The approach is based on unsupervised, knowledge-poor techniques which facilitate adaptation to new languages and domains. The system's results are comparable to those of supervised, language-specific system...

متن کامل

یک چارچوب نیمه‌نظارتی مبتنی بر لغت‌نامه وفقی خودساخت جهت تحلیل نظرات فارسی

With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...

متن کامل

W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis

With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches for Aspect Based Sentiment Analysis obtain good results for the domain/language they are trained on, but havin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2017