Transforming Standard Arabic to Colloquial Arabic
نویسندگان
چکیده
We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.
منابع مشابه
A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic
Recently the rate of written colloquial text has increased dramatically. It is being used as a medium of expressing ideas especially across the WWW, usually in the form of blogs and partially colloquial articles. Most of these written colloquial has been in the Egyptian colloquial dialect, which is considered the most widely dialect understood and used throughout the Arab world. Modern Standard...
متن کاملAn Opinion Analysis Tool for Colloquial and Standard Arabic
Social networks and users’ interactions are distinct features for the current Web. They constitute a fundamental part of Web 2.0, where people produce, disseminate, and consume information in new interactive forms where users are not only passive information receivers. Social media succeed to attract a large portion of online users, which explains the explosive growth of social media in terms o...
متن کاملAn Investigation in Speech Recognition for Colloquial Arabic
This paper describes a study of grapheme-based speech recognition for colloquial Arabic. An investigation of language and acoustic model configurations is carried out to illustrate the differences between colloquial and modern standard Arabic (MSA) on the example of Levantine telephone conversations. The study defines extensive and carefully crafted data sets for different dialects and studies ...
متن کاملCRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition
Most of the available resources of colloquial Arabic speech are transcribed without diacritics. Those diacritics provide short vowels and other pronunciation information and by omitting them a considerable amount of ambiguity is introduced. In this paper, we propose the use of an automatic diacritisation method as front-end for training of automatic speech recognition systems of colloquial Arab...
متن کاملA Baseline Speech Recognition System for Levantine Colloquial Arabic
The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation...
متن کامل