Extracting the roots of Arabic words without removing affixes

نویسندگان

  • Qussai Yaseen
  • Ismail Hmeidi
چکیده

Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Analysis and Diacritical Arabic Text Compression

Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...

متن کامل

Stemming Arabic Conjunctions and Prepositions

Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equivalents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly ident...

متن کامل

Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic

Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage lexical resource to improve the accuracy of morphological analyzers and part-of-speech taggers of Arabic text. Over the past 1200 years, many different kinds of Arabic language lexicons were constructed; these lexicons...

متن کامل

The productivity of a root-initial accenting suffix, [-zu]: Judgement studies

In many languages affixes can assign accents on roots to which they attach. Some previous studies have claimed that accents assigned by affixes universally fall on syllables next to the affixes (Kurisu 2001; Revithiadou 2008). Kawahara and Wolf (2010) document a newly-coined suffix which counterexemplifies this generalization: the new Japanese suffix [-zu] assigns an accent on root-initial syll...

متن کامل

نقد کتاب پژوهشی (ادبیــات) /به فرهنگ باشد روان تندرست: نقدی بر کتاب فرهنگ واره لغات و ترکیبات عربی شاهنامه، هوشنگ محمدی افشار

The latest comprehensive and detailed research on the recognition, description, and the etymology of the Arabic lexicon of Shahnameh is the dictionary of Arabic words and Expressions of Shahnameh, written by Dr. Sajjad Aydanlou. This book is based on the second edition of the Correction of the Khaleghi Motlagh Shahnameh (1393) which is the most authoritative correction and the closest to the or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014