Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

نویسندگان

  • Hani Abu-Salem
  • Mahmoud Al-Omari
  • Martha W. Evens
چکیده

Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic Information Retrieval System (Arabic-IRS) by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mixed Stemming, computes term importance using a weighting scheme that uses the Term Frequency (TF) and the Inverse Document-Frequency (IDF), called TFxIDF. An extended version of the Arabic-IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the Binary scheme and the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بررسی تأثیرات ریشه‌یابی در بازیابی اطلاعات در زبان فارسی

Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...

متن کامل

Morphological Variation of Arabic Queries

Although it has been shown that in test collection based studies, stemming improves retrieval effectiveness in an information retrieval system, morphological variations of queries searching on the same topic are less well understood. This work examines the broad morphological variation that searchers of an Arabic retrieval system put into their queries. In this study, 15 native Arabic speakers ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIS

دوره 50  شماره 

صفحات  -

تاریخ انتشار 1999