A New Stemmer to Improve Information Retrieval
نویسنده
چکیده
A stemming is a technique used to reduce words to their root form, by removing derivational and inflectional affixes. The stemming is widely used in information retrieval tasks. Many researchers demonstrate that stemming improves the performance of information retrieval systems. Porter stemmer is the most common algorithm for English stemming. However, this stemming algorithm has several drawbacks, since its simple rules cannot fully describe English morphology. Errors made by this stemmer may affect the information retrieval performance. The present paper proposes an improved version of the original Porter stemming algorithm for the English language. The proposed stemmer is evaluated using the error counting method. With this method, the performance of a stemmer is computed by calculating the number of understemming and overstemming errors. The obtained results show an improvement in stemming accuracy, compared with the original stemmer, but also compared to other stemmers such as Paice and Lovins stemmers. We prove, in addition, that the new version of porter stemmer affects the information retrieval performance.
منابع مشابه
MAULIK: An Effective Stemmer for Hindi Language
In this paper, a new stemmer has been proposed named as “Maulik” for Hindi Language. This stemmer is purely based on Devanagari script and it uses the Hybrid approach (combination of brute force and suffix removal approach). Stemming can be used to improve the effectiveness of information retrieval. The proposed stemmer is both computationally inexpensive and domain independent. The results are...
متن کاملبررسی تأثیرات ریشهیابی در بازیابی اطلاعات در زبان فارسی
Using the language-specific behavior in information retrieval systems can improve the quality of the retrieved results significantly. Part of the word that remains after removing its affixes is called stem. Stemming process can be used for improving the relevancy of the results in information retrieval system. Different morphological variants of words (plural, past tense…) will be mapped into t...
متن کاملA survey of stemming algorithms in information retrieval
Background. During the last fifty years, improved information retrieval techniques have become necessary because of the huge amount of information people have available, which continues to increase rapidly due to the use of new technologies and the Internet. Stemming is one of the processes that can improve information retrieval in terms of accuracy and performance. Aim. This paper provides a d...
متن کاملA new hybrid stemming algorithm for Persian
Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based stemmer, statistical-based stemmers, and rulebased stemmers. This paper aims at building a hybrid stemmer that uses both Dictionary based method and rule-based...
متن کاملStatistical vs. Rule-Based Stemming for Monolingual French Retrieval
This paper describes our approach to the 2006 Adhoc Monolingual Information Retrieval run for French. The goal of our experiment was to compare the performance of a proposed statistical stemmer with that of a rule-based stemmer, specifically the French version of Porter’s stemmer. The statistical stemming approach is based on lexicon clustering, using a novel string distance measure. We submitt...
متن کامل