Bon: First Persian Stemmer

نویسندگان

  • M. Tashakori
  • M. Meybodi
  • F. Oroumchian
چکیده

Stemmers are softwares that find syntactic` roots of the words. They play an important role in natural language processing and other fields such as information retrieval (IR). In IR using stemmed words instead of the original words, could increase as much as 15 percent to the overall performance. In this paper, we report on the development of the first Persian stemmer (Bon). Bon is tested on a collection of Persian texts in the domain of computer science. In our experiments, the recall has been improved by 40 percent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new hybrid stemming algorithm for Persian

Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based stemmer, statistical-based stemmers, and rulebased stemmers. This paper aims at building a hybrid stemmer that uses both Dictionary based method and rule-based...

متن کامل

Comparative Study of Various Persian Stemmers in the Field of Information Retrieval

In linguistics, stemming is the operation of reducing words to their more general form, which is called the ‘stem’. Stemming is an important step in information retrieval systems, natural language processing, and text mining. Information retrieval systems are evaluated by metrics like precision and recall and the fundamental superiority of an information retrieval system over another one is mea...

متن کامل

HPS: a hierarchical Persian stemming method

In this paper, a novel hierarchical Persian Stemming approach based on the Part-Of-Speech (POS) of the word in a sentence is presented. The implemented stemmer includes hash tables and several deterministic finite automata (DFA) in its different levels of hierarchy for removing the prefixes and suffixes of the words. We had two intentions in using hash tables in our method. The first one is tha...

متن کامل

A New Method for Stemming in Persian Language Considering Exceptions

In this paper a new algorithm for stemming in Farsi language is presented. This stemmer is based on removing the suffixes and prefixes but a database is used to save the exceptions to decrease error rate. In the proposed method the speed of stemmer and also the percentage of errors are improved. The evaluation results on a small Farsi document collection show significant improvement in precisio...

متن کامل

Ad Hoc Retrieval with the Persian Language

This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002