Rules Versus Statistics: Insights From a Highly Inflected Language

نویسندگان

  • Jelena Mirkovic
  • Mark S. Seidenberg
  • Marc F. Joanisse
چکیده

Inflectional morphology has been taken as a paradigmatic example of rule-governed grammatical knowledge (Pinker, 1999). The plausibility of this claim may be related to the fact that it is mainly based on studies of English, which has a very simple inflectional system. We examined the representation of inflectional morphology in Serbian, which encodes number, gender, and case for nouns. Linguists standardly characterize this system as a complex set of rules, with disagreements about their exact form. We present analyses of a large corpus of nouns which showed that, as in English, Serbian inflectional morphology is quasiregular: It exhibits numerous partial regularities creating neighborhoods that vary in size and consistency. We then asked whether a simple connectionist network could encode this statistical information in a manner that also supported generalization. A network trained on 3,244 Serbian nouns learned to produce correctly inflected phonological forms from a specification of a word's lemma, gender, number, and case, and generalized to untrained cases. The model's performance was sensitive to variables that also influence human performance, including surface and lemma frequency. It was also influenced by inflectional neighborhood size, a novel measure of the consistency of meaning to form mapping. A word-naming experiment with native Serbian speakers showed that this measure also affects human performance. The results suggest that, as in English, generating correctly inflected forms involves satisfying a small number of simultaneous probabilistic constraints relating form and meaning. Thus, common computational mechanisms may govern the representation and use of inflectional information across typologically diverse languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing Assessment Literacy: Insights From a High-Stakes Test

This study constitutes an attempt to see what Language assessment literacy (LAL) isfor three groups of stakeholders, namely LAL test developers, LAL instructors, andLAL test-takers. The perceptions of the former group were derived from the contentanalysis of the latest version of the LAL test, and those of the latter 2 groups wereassessed through a survey designed by the researcher. Participant...

متن کامل

Rule Based Urdu Stemmer

This paper presents Rule based Urdu Stemmer. In this technique rules are applied to remove suffix and prefix from the inflected words. Urdu is well spoken language all over the world but less work has been done on Urdu stemming. Stemmer helps us to find the root of the inflected word. Various possibilities of inflected words like ںو (vao+noon-gunna), ے (badi-ye), ںای (choti-ye+alif+noon-gunna) ...

متن کامل

Unsupervised Formation Matching in Highly Inflected Languages

There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective in a number of languages, particularly highly inflected languages. In this paper we propose a method for finding affixes in different positions of a word. Com...

متن کامل

Dictionary of Multiword Expressions for Translation into highly Inflected Languages

Treatment of Multiword Expressions (MWEs) is one of the most complicated issues in natural language processing, especially in Machine Translation (MT). The paper presents dictionary of MWEs for a English-Latvian MT system, demonstrating a way how MWEs could be handled for inflected languages with rich morphology and rather free word order. The proposed dictionary of MWEs consists of two constit...

متن کامل

Highly-Inflected Language Generation Using Factored Language Models

Statistical language models based on n-gram counts have been shown to successfully replace grammar rules in standard 2-stage (or ‘generate-and-select’) Natural Language Generation (NLG). In highlyinflected languages, however, the amount of training data required to cope with n-gram sparseness may be simply unobtainable, and the benefits of a statistical approach become less obvious. In this wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Cognitive science

دوره 35 4  شماره 

صفحات  -

تاریخ انتشار 2011