A Stemming Algorithmm for the Portuguese Language
نویسندگان
چکیده
Stemming algorithms are traditionally used in Information Retrieval with the goal of enhancing recall, as they conflate the variant forms of a word into a common representation. This paper describes the development of a simple and eflective su&?x-stripping algorithm for Portuguese. The stemmer is evaluated using a method proposed by Paice f9/. The results show that it performs significantly better than the Portuguese version of the Porter algorithm.
منابع مشابه
The Presence and Influence of English in the Portuguese Financial Media
As the lingua franca of the 21st century, English has become the main language for intercultural communication for those wanting to embrace globalization. In Portugal, it is the second language of most public and private domains influencing its culture and discourses. Language contact situations transform languages by the incorporations they make from other languages and Portugal has...
متن کاملStemming Strategies for European Languages
In this paper, we describe and evaluate different general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemming approaches are quite effective for the French, Portuguese and Hungarian languages, and perform reasonably well for the German language. Variations in mean average precision amo...
متن کاملA Study on the use of Stemming for Monolingual Ad-Hoc Portuguese Information Retrieval
For UFRGS’s first participation in CLEF our goal was to compare the performance of heavier and lighter stemming strategies using the Portuguese data collections for monolingual Ad-hoc retrieval. The results show that the safest strategy was to use the lighter alternative (reducing plural forms only). On a query-by-query analysis, full stemming achieved the highest improvement but also the bigge...
متن کاملReport of MIRACLE Team for the Ad-Hoc Track in CLEF 2006
This paper presents the 2006 MIRACLE’s team approach to the AdHoc Information Retrieval track. The experiments for this campaign keep on testing our IR approach. First, a baseline set of runs is obtained, including standard components: stemming, transforming, filtering, entities detection and extracting, and others. Then, a extended set of runs is obtained using several types of combinations of...
متن کاملData Fusion for Effective European Monolingual Information Retrieval
For our fourth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list and a light stemming procedure for the Portuguese language. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in the Finnish and Russian languages. Finally, based on the Z-score method...
متن کامل