Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian
نویسندگان
چکیده
Persian is a challenging language in the field of NLP. Rightto-left orthography, complex morphology, complicated grammatical rules, and different forms of letters make it an interesting language for NLP research. In this paper we measure the effectiveness of a simple and efficient stemming algorithm, Perstem, on Persian information retrieval. Our experiments on the Hamshahri corpus at CLEF2009 show that the Perstem algorithm greatly improved both precision (+91% ) and recall (+43% ).
منابع مشابه
Ad Hoc Retrieval with the Persian Language
This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...
متن کاملA new hybrid stemming algorithm for Persian
Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based stemmer, statistical-based stemmers, and rulebased stemmers. This paper aims at building a hybrid stemmer that uses both Dictionary based method and rule-based...
متن کاملQuery Wikification: Mining Structured Queries From Unstructured Information Needs using Wikipedia-based Semantic Analysis
Combining the language model and inference network, as implemented in the Indri search engine, is efficient and verified approach. In this retrieval model, the user’s information need is exhibited as Indri’s Structural Query Language. Although the SQL allows expert users to richly represent its information needs but unfortunately, the complicacy of SQLs make them unpopular in the WEB for ordina...
متن کاملA New Method for Stemming in Persian Language Considering Exceptions
In this paper a new algorithm for stemming in Farsi language is presented. This stemmer is based on removing the suffixes and prefixes but a database is used to save the exceptions to decrease error rate. In the proposed method the speed of stemmer and also the percentage of errors are improved. The evaluation results on a small Farsi document collection show significant improvement in precisio...
متن کاملA Bottom Up approach to Persian Stemming
Stemmers have many applications in natural language processing and some fields such as information retrieval. Many algorithms have been proposed for stemming. In this paper, we propose a new algorithm for Persian language. Our algorithm is a bottom up algorithm that is capable to reorganize without changing the implementation. Our experiments show that the proposed algorithm has a suitable resu...
متن کامل