Golden Rule of Morphology and Variants of Wordforms

نویسنده

  • Jaroslava Hlaváčová
چکیده

In many languages, some words can be written in several ways. We call them variants. Values of all their morphological categories are identical, which leads to an identical morphological tag. Together with the identical lemma, we have two or more wordforms with the same morphological description. This ambiguity may cause problems in various NLP applications. There are two types of variants – those affecting the whole paradigm (global variants) and those affecting only wordforms sharing some combinations of morphological values (inflectional variants). In the paper, we propose means how to tag all wordforms, including their variants, unambiguously. We call this requirement ”Golden rule of morphology”. The paper deals mainly with Czech, but the ideas can be applied to other languages as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reflections on the Application of the Golden Rule of Ethics in the Holy Quran

The golden rule of ethics has various formations and one of the most common ones says: "whatever you like for yourself like it for others too and whatever you do not like for yourself do not like it for others, too." The golden rule has been discussed in most of the religions including Abrahamic Religions. There are numerous narrations with the concept of golden rule but in the Holy Quran, it h...

متن کامل

Persia and the Golden Rule

My paper has two parts. First, I talk about the golden rule. After introducing the rule and its global importance, I explain why many scholars dismiss it as a vague proverb that leads to absurdities when we try to formulate it clearly. I defend the golden rule against such objections. Second, I talk about the golden rule in Persia and Islam; I consider Persian sources (Muslim and non-Muslim) an...

متن کامل

Efficient Stochastic Part-of-Speech Tagging for Hungarian

Many of the methods developed for Western European languages and used widespread to produce annotated language resources cannot readily be applied to Central and Eastern European languages, due to the large number of novel phenomena exhibited in the syntax and morphology of these languages, which these methods have to handle but have not been designed to cope with. The process of morphological ...

متن کامل

The Golden Principle of Ethics and Legal Prevention of Illegal Use of Databases

Background: Collecting data in the form of databases is one of the new methods that has been growing with the advancement of information technology. The use of databases in management and policy-making has created challenges in areas such as privacy breaches and compliance with accepted ethical norms. The increasing use of databases in public and private organizations in Iran has introduced new...

متن کامل

Rule-Based Normalization of Historical Texts

This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rulebased approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. The eva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017