a method to convert sana’ani accent to modern standard arabic

نویسندگان

g. h. al-gaphari ph.d. , department of computer science university of sana’a

m. al-yadoumi ph.d. , department of electrical engineering university of sana’a

چکیده

this paper presents an efficient mechanism to convert sana’ani dialect to modern standard arabic. the mechanism is based on morphological rules related to sana’ani dialect as well as modern standard arabic. such rules facilitate the dialect conversion to its corresponding msa. the mechanism tokenizes the input dialect text and divides each token into stem and its affixes; such affixes can be categorized into two categories: dialect affixes and/or msa affixes. at the same time, the stem could be dialect stem or msa stem. therefore, our mechanism, implemented by using a simple msa stemmer, must pay attention to such situations. then our dialect stemmer is applied to strip the resulting token and extract dialect affixes. at this point, the rules are applied to decide when to carry out the extraction of an affix. the experiment shows that sana’ani dialect has three classes of distortions, which are prefixes, suffixes, and stems distortions. the algorithm normalizes such distortion based on the morphological rules. for each morphological rule the mechanism checks possibility of applying such a rule. that means if rule conditions be met, then the dialect affix will be replaced by its corresponding msa. if there is no restriction on applying the rule related to the distorted stem, then the rule can be considered as a parallel corpus of the dialect and msa. finally, the experiment computes the distortion ratio of msa in sana’ani dialect. for a sana’ani dialect sample of 9386 words, 16.29% of them have distorted suffixes, 0.70% have distorted prefixes and 2.17% contain distorted stems. these percentages are related only to the processed words.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resumptive Pronoun Detection for Modern Standard Arabic to English MT

Many languages, including Modern Standard Arabic (MSA), insert resumptive pronouns in relative clauses, whereas many others, such as English, do not, using empty categories instead. This discrepancy is a source of difficulty when translating between these languages because there are words in one language that correspond to empty categories in the other, and these words must either be inserted o...

متن کامل

Building A Modern Standard Arabic Corpus

Language Engineering, including Information Retrieval, Machine Translation and other Natural Language-related disciplines, is showing in recent years more interest in the Arabic language. Suitable resources for Arabic are becoming a vital necessity for the progress of this research. Until recently, only two Arabic corpora were commonly available for researchers: the AFP Arabic newswire from LDC...

متن کامل

How to Convert a Latin Handwriting Recognition System to Arabic

Arabic handwriting recognition systems have been evaluated in public at the ICDAR conferences in 2005 and 2007. This contest provides well defined training data and unpublished test data, which makes the performance of those systems well comparable among each other. The development of the winning system of 2007 is described here in some detail. It is an HMM-based recognition system for Latin sc...

متن کامل

A Method to Convert Thesauri to SKOS

Thesauri can be useful resources for indexing and retrieval on the Semantic Web, but often they are not published in RDF/OWL. To convert thesauri to RDF for use in Semantic Web applications and to ensure the quality and utility of the conversion a structured method is required. Moreover, if different thesauri are to be interoperable without complicated mappings, a standard schema for thesauri i...

متن کامل

Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in ...

متن کامل

Transforming Standard Arabic to Colloquial Arabic

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabi...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
international journal of information science and management

جلد ۸، شماره ۱، صفحات ۳۹-۴۹

کلمات کلیدی

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023