Complexity Metric for Code-Mixed Social Media Text
نویسندگان
چکیده
An evaluation metric is an absolute necessity for measuring the performance of any system and complexity of any data. In this paper, we have discussed how to determine the level of complexity of code-mixed social media texts that are growing rapidly due to multilingual interference. In general, texts written in multiple languages are often hard to comprehend and analyze. At the same time, in order to meet the demands of analysis, it is also necessary to determine the complexity of a particular document or a text segment. Thus, in the present paper, we have discussed the existing metrics for determining the code-mixing complexity of a corpus, their advantages and shortcomings as well as proposed several improvements on the existing metrics. The new index better reflects the variety and complexity of a multilingual document. Also, the index can be applied to a sentence and seamlessly extended to a paragraph or an entire document. We have employed two existing code-mixed corpora to suit the requirements of our study.
منابع مشابه
SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text
Use of social media has grown dramatically fast during the past few years. Users usually follow informal languages in communicating through social media. This language of communication is often mixed in nature, where people transcribe their regional language with English. This technique of writing is increasing its popularity rapidly. Natural language processing (NLP) aims to infer the informat...
متن کاملRevisiting Automatic Transliteration Problem for Code-Mixed Romanized Indian Social Media Text
Although automatic Transliteration for Indian languages is a well studied paradigm, but availab le t ransliteration techniques fail in the Indian social media context due to phenomena such as wordplay, creative spelling, codemixing, and phonetic romanized typing; all implying that transliteration for Indian social media text has to be revisited. The paper reports an init ial study on automatic ...
متن کاملPOS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments
We discuss Part-of-Speech(POS) tagging of Hindi-English Code-Mixed(CM) text from social media content. We propose extensions to the existing approaches, we also present a new feature set which addresses the transliteration problem inherent in social media. We achieve an 84% accuracy with the new feature set. We show that the context and joint modeling of language detection and POS tag layers do...
متن کاملRecurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text
This paper describes Centre for Development of Advanced Computing’s (CDACM) submission to the shared task’Tool Contest on POS tagging for CodeMixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text’, collocated with ICON-2016. The shared task was to predict Part of Speech (POS) tag at word level for a given text. The codemixed text is generated mostly on social media by multilingual us...
متن کاملSentiment Identification in Code-Mixed Social Media Text
Sentiment analysis is the Natural Language Processing (NLP) task dealing with the detection and classification of sentiments in texts. While some tasks deal with identifying presence of sentiment in text (Subjectivity analysis), other tasks aim at determining the polarity of the text categorizing them as positive, negative and neutral. Whenever there is presence of sentiment in text, it has a s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computación y Sistemas
دوره 21 شماره
صفحات -
تاریخ انتشار 2017