To stem or lemmatize a highly inflectional language in a probabilistic IR environment?
نویسندگان
چکیده
Effects of three different morphological methods-lemmatization, stemming and inflectional stem generation-for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four point relevance scale which is partitioned differently in different test settings. Results show that inflectional stem generation which has not been used much in IR, compares well with lemmatization in a best-match IR environment. Differences in performance between inflectional stem generation and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used – a Porter stemmer implementation – is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of 2 compound splitting and derivational expansion of queries are tested.
منابع مشابه
Designing a NooJ Module for Turkish Inflectional Analysis: an Example of Highly Productive Morphology
Turkish is a highly inflectional language that represents an interesting challenge to traditional corpus processing techniques. We present here the design of a basic module that allows NooJ users to lemmatize and perform morphological analysis on Turkish texts.
متن کاملWord Form Text Database Indexes Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Word Form Text Database Indexes 465
The paper presents a new method for handling of morphological variation of query terms in best-match IR. The method is based on enhanced inflectional stems. Use of inflectional stems has earlier been shown to be a good retrieval method in inflected indexes in a best-match environment for a highly inflected and compound-rich language, Finnish. In this paper the earlier stem method is elaborated ...
متن کاملThe Inflectional “-y” at the End of of Imperative Verb in Middle (Dari) Persian
In Early Modern Persian prose and verse, verbs with a present stem accompanied by grapheme "-y" have been used to express modal concepts of imperative and command or invocation and request. Researchers, regardless of the historical changes of Early Modern Persian, believe that this structure of subjunctive 2nd person singular has been used to express imperative mood, and that &q...
متن کاملDeveloping an automatic linguistic truncation operator for best-match retrieval of Finnish in inflected word form text database indexes
The paper presents a new method for handling of morphological variation of query terms in best-match IR. The method is based on enhanced inflectional stems. Use of inflectional stems has earlier been shown to be a good retrieval method in inflected indexes in a best-match environment for a highly inflected and compound-rich language, Finnish. In this paper the earlier stem method is elaborated ...
متن کاملA Study of Inflectional Categories of Noun in Sistani Dialect
The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Documentation
دوره 61 شماره
صفحات -
تاریخ انتشار 2005