Computer Processing Of Arabic Script-Based Languages. Current State And Future Directions
نویسنده
چکیده
Arabic script-based languages do not belong to a single language family, and therefore exhibit different linguistic properties. To name just a few: Arabic is primarily a VSO language whereas Farsi is an SVO and Urdu is an SOV language. Both Farsi and Urdu have light verbs whereas Arabic does not. Urdu and Arabic have grammatical gender while Farsi does not. There are, however, linguistic and non-linguistic factors that bring these languages together. On the linguistic side it is the use of the Arabic script, the right to left direction, the absence of characters representing short vowels and the complex word structure. Non-linguistic common properties that bind the majority of speakers of these languages include: the Qur’an that every Moslem has to recite in Arabic, proximity of the countries speaking these languages, common history and, to a large extent, a common culture and historical influx . It is not surprising, then, that the surge of interest in the study of these languages and the sudden availability for funding to support the development of computational applications to process data in these languages come for all these languages at the same time.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملTokenizing an Arabic Script Language
In any natural language processing project, the input text needs to undergo tokenization before morphological analysis or parsing. For Arabic script languages the tokenization process faces more problems and it plays a more crucial role in natural language processing (NLP) systems for Arabic script languages. In this work we elaborate on some of these problems and present solutions for these. T...
متن کاملIntroduction to Arabic Natural Language Processing
This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language. The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing. The book discusses Arabic script, phonology, orthography, morphology, syntax and semantics, with...
متن کاملA Study of Sindhi Related and Arabic Script Adapted languages Recognition
1. INTRODUCTION The character recognition of the Roman type of languages especially English has come near to perfection and it is also considered as one of the successful application in the field of computer vision. The work on Arabic script and other scripts is being continued on; but the languages adopting Arabic script is very little while the work on Sindhi language is near to its origin. T...
متن کاملA Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Application
This paper offers a transcription system for Persian, the target language in the Transonics project, a speech-to-speech translation system developed as a part of the DARPA Babylon program (The DARPA Babylon Program; Narayanan, 2003). In this paper, we discuss transcription systems needed for automated spoken language processing applications in Persian that uses the Arabic script for writing. Th...
متن کامل