Comparing Ranking-based and Naive Bayes Approaches to Language Detection on Tweets
نویسندگان
چکیده
This article describes two systems participating to the TweetLID-2014 competition focused on language detection in tweets. The systems are based on two different strategies: ranked dictionaries and Naive Bayes classifiers. The results show that ranking dictionaries performs better with small training corpora whose language distribution is similar to that of the test dataset, while a Naive Bayes algorithm improves the scores with large models even if the data are unbalanced with regard to the test dataset. The experiments also showed that the models based on word unigrams outperform the use of n-grams of characters. In the final evaluation the Naive Bayes classifier got the first position among the unconstrained systems (trained with external sources) participating in the competition.
منابع مشابه
Comparing Approaches to Subjectivity Classification: A Study on Portuguese Tweets
In this paper, we compare lexicon-based and machine learning-based approaches to define the subjectivity of tweets in Portuguese. We tested SentiLex and WordAffectBR lexicons, and Sequential Machine Optimization and Naive Bayes algorithms for this task. In our study, we used the Computer-BR corpus that contains messages about the technology area. We obtained better results using the Comprehensi...
متن کاملSAIL: Sentiment Analysis using Semantic Similarity and Contrast Features
This paper describes our submission to SemEval2014 Task 9: Sentiment Analysis in Twitter. Our model is primarily a lexicon based one, augmented by some preprocessing, including detection of MultiWord Expressions, negation propagation and hashtag expansion and by the use of pairwise semantic similarity at the tweet level. Feature extraction is repeated for sub-strings and contrasting sub-string ...
متن کاملSentence Boundary Detection for Social Media Text
The paper presents a study on automatic sentence boundary detection in social media texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative investigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential ...
متن کامل#WarTeam at SemEval-2017 Task 6: Using Neural Networks for Discovering Humorous Tweets
This paper presents the participation of #WarTeam in Task 6 of SemEval2017 with a system classifying humor by comparing and ranking tweets. The training data consists of annotated tweets from the @midnight TV show. #WarTeam’s system uses a neural network (TensorFlow) having inputs from a Naïve Bayes humor classifier and a sentiment analyzer.
متن کاملComparing Experiential Approaches: Structured Language Learning Experiences versus Conversation Partners for Changing Pre-Service Teacher Beliefs
Research has shown that language teachers’ beliefs are often difficult to change through education. Experiential learning may help, but more research is needed to understand how experiential approaches shape perceptions. This study compares two approaches, conversation partners (CONV) and structured language learning experiences (SLLE), integrated into a course in language acquisition. Partici...
متن کامل