The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

نویسندگان

چکیده

Urdu is still considered a low-resource language despite being ranked as world’s $10^{th}$ most spoken with nearly 230 million speakers. The scarcity of benchmark datasets in languages has led researchers to utilize more ingenious techniques curb the issue. One such option widely adopted use translation services replicate existing from resource-rich English languages, Urdu. For natural processing tasks, including polarity assessment, words translated via Google translator one another often change meaning. It results shift causing system’s performance degradation, particularly for sentiment classification and emotion detection tasks. This study evaluates effect on task language. identifies enlists into five distinct categories. further finds correlation between similar roots. Our shows 2-3 percentage points degradation due result languages.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages

We propose an unsupervised system for a variant of cross-lingual lexical substitution (CLLS) to be used in a reading scenario in computer-assisted language learning (CALL), in which single-word translations provided by a dictionary are ranked according to their appropriateness in context. In contrast to most alternative systems, ours does not rely on either parallel corpora or machine translati...

متن کامل

Cross-Lingual Parser Selection for Low-Resource Languages

In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We c...

متن کامل

Cross-Lingual Morphological Tagging for Low-Resource Languages

Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer ...

متن کامل

Sequence-based Multi-lingual Low Resource Speech Recognition

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends...

متن کامل

the impact of skopos on syntactic features of the target text

the present study is an experimental case study which investigates the impacts, if any, of skopos on syntactic features of the target text. two test groups each consisting of 10 ma students translated a set of sentences selected from advertising texts in the operative and informative mode. the resulting target texts were then statistically analyzed in terms of the number of words, phrases, si...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3110285