Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish

نویسنده

  • Michal Marcinczuk
چکیده

In the paper we present a tool for lemmatization of multi-word common noun phrases and named entities for Polish called PoLem1. The tool is based on a set of manually crafted rules and heuristics utilizing a set of dictionaries (including morphological, named entities and inflection patterns). The accuracy of lemmatization obtained by the tool reached 97.99% on a dataset with multi-word common noun phrases and 86.17% for casesensitive evaluation on a dataset with named entities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UNED at ImageCLEF 2004: Detecting Named Entities and Noun Phrases for Automatic Query Expansion and Structuring

This paper describes UNED experiments at the Image CLEF bilingual ad hoc task. Two different strategies are attempted: i) automatic expansion and translation using noun phrases; ii) automatic detection of named entities in the query for structured search on image caption fields. All our experiments obtain results above the average MAP for the bilingual task. Structured searches using named enti...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Coreference Resolution of Named Entities and Noun Phrases in Web Pages

An approach for intra-document coreference resolution of named entities and noun phrases is proposed. This approach is a knowledgepoor, integrated approach to coreference resolution which relies on syntactic, discourse and semantic information (using WordNet). Our approach is also intended to exploit the structural features of web pages for the purposes of discourse analysis. This research is i...

متن کامل

Naming clusters in visualization studies: Parsing and filtering of noun phrases from citation contexts

The present study presents a semi-automatic method for parsing and filtering of noun phrases from citation contexts. The purpose of the method is to extract contextual, agreed upon, and pertinent noun phrases, to be used in visualization studies for naming clusters (concept groups) or concept symbols. The method is applied in a case study, which forms part of a larger dissertation work concerni...

متن کامل

Russian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations

The paper presents results on Russian named entities classification and equivalent named entities retrieval using word and phrase representations. It is shown that a word or an expression’s context vector is an efficient feature to be used for predicting the type of a named entity. Distributed word representations are now claimed (and on a reasonable basis) to be one of the most promising distr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017