Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

نویسندگان

چکیده

Code-mixing and code-switching are frequent features in online conversations. Classification of such text is challenging if one the languages low-resourced. Fine-tuning pre-trained multilingual language models a promising avenue for code-mixed classification. In this paper, we explore adapter-based fine-tuning PMLMs CMCS We introduce sequential parallel stacking adapters, continuous training adapters without freezing original model as novel techniques with respect to single-task also present newly annotated dataset classification Sinhala–English code-switched data, where Sinhala low-resourced language. Our 10000 user comments has been manually five tasks: sentiment analysis, humor detection, hate speech identification, aspect thus making it first publicly available largest number task annotation types. addition dataset, tested our proposed on Kannada–English Hindi–English datasets. These experiments confirm that PMLM outperform or par basic models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-tuned Language Models for Text Classification

Transfer learning has revolutionized computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Fine-tuned Language Models (FitLaM), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a state-of-the-art language model. Our method significantly out...

متن کامل

Generating Code-switched Text for Lexical Learning

A vast majority of L1 vocabulary acquisition occurs through incidental learning during reading (Nation, 2001; Schmitt et al., 2001). We propose a probabilistic approach to generating code-mixed text as an L2 technique for increasing retention in adult lexical learning through reading. Our model that takes as input a bilingual dictionary and an English text, and generates a code-switched text th...

متن کامل

Dual Language Models for Code Mixed Speech Recognition

In this work, we present a new approach to language modeling for bilingual code-switched text. This technique, called dual language models, involves building two complementary monolingual language models and combining them using a probabilistic model for switching between the two. The objective of this technique is to improve generalization when the amount of code-switched training data is limi...

متن کامل

DCU-UVT: Word-Level Language Classification with Code-Mixed Data

This paper describes the DCU-UVT team’s participation in the Language Identification in Code-Switched Data shared task in the Workshop on Computational Approaches to Code Switching. Wordlevel classification experiments were carried out using a simple dictionary-based method, linear kernel support vector machines (SVMs) with and without contextual clues, and a k-nearest neighbour approach. Based...

متن کامل

Natural Language Processing based Automatic Multilingual Code Generation

Unified modeling language is being used as a premier tool for modeling the user requirements. These CASE tools provide an easy way to get efficient solutions. This paper presents a natural language processing based automated system for generating code in multilanguages after modeling the user requirements based on UML. UML diagrams are first generated by analyzing the given business scenario pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge and Information Systems

سال: 2022

ISSN: ['0219-3116', '0219-1377']

DOI: https://doi.org/10.1007/s10115-022-01698-1