Transforming Standard Arabic to Colloquial Arabic

نویسندگان

  • Emad Mohamed
  • Behrang Mohit
  • Kemal Oflazer
چکیده

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic

Recently the rate of written colloquial text has increased dramatically. It is being used as a medium of expressing ideas especially across the WWW, usually in the form of blogs and partially colloquial articles. Most of these written colloquial has been in the Egyptian colloquial dialect, which is considered the most widely dialect understood and used throughout the Arab world. Modern Standard...

متن کامل

An Opinion Analysis Tool for Colloquial and Standard Arabic

Social networks and users’ interactions are distinct features for the current Web. They constitute a fundamental part of Web 2.0, where people produce, disseminate, and consume information in new interactive forms where users are not only passive information receivers. Social media succeed to attract a large portion of online users, which explains the explosive growth of social media in terms o...

متن کامل

An Investigation in Speech Recognition for Colloquial Arabic

This paper describes a study of grapheme-based speech recognition for colloquial Arabic. An investigation of language and acoustic model configurations is carried out to illustrate the differences between colloquial and modern standard Arabic (MSA) on the example of Levantine telephone conversations. The study defines extensive and carefully crafted data sets for different dialects and studies ...

متن کامل

CRF-based Diacritisation of Colloquial Arabic for Automatic Speech Recognition

Most of the available resources of colloquial Arabic speech are transcribed without diacritics. Those diacritics provide short vowels and other pronunciation information and by omitting them a considerable amount of ambiguity is introduced. In this paper, we propose the use of an automatic diacritisation method as front-end for training of automatic speech recognition systems of colloquial Arab...

متن کامل

A Baseline Speech Recognition System for Levantine Colloquial Arabic

The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012