The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing
نویسندگان
چکیده
Urdu is still considered a low-resource language despite being ranked as world’s $10^{th}$ most spoken with nearly 230 million speakers. The scarcity of benchmark datasets in languages has led researchers to utilize more ingenious techniques curb the issue. One such option widely adopted use translation services replicate existing from resource-rich English languages, Urdu. For natural processing tasks, including polarity assessment, words translated via Google translator one another often change meaning. It results shift causing system’s performance degradation, particularly for sentiment classification and emotion detection tasks. This study evaluates effect on task language. identifies enlists into five distinct categories. further finds correlation between similar roots. Our shows 2-3 percentage points degradation due result languages.
منابع مشابه
Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
We propose an unsupervised system for a variant of cross-lingual lexical substitution (CLLS) to be used in a reading scenario in computer-assisted language learning (CALL), in which single-word translations provided by a dictionary are ranked according to their appropriateness in context. In contrast to most alternative systems, ours does not rely on either parallel corpora or machine translati...
متن کاملCross-Lingual Parser Selection for Low-Resource Languages
In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We c...
متن کاملCross-Lingual Morphological Tagging for Low-Resource Languages
Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer ...
متن کاملSequence-based Multi-lingual Low Resource Speech Recognition
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends...
متن کاملthe impact of skopos on syntactic features of the target text
the present study is an experimental case study which investigates the impacts, if any, of skopos on syntactic features of the target text. two test groups each consisting of 10 ma students translated a set of sentences selected from advertising texts in the operative and informative mode. the resulting target texts were then statistically analyzed in terms of the number of words, phrases, si...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3110285