Evaluation and Adaptation of the Celex Dutch Morphological Database

نویسندگان

  • Tom Laureys
  • Guy De Pauw
  • Hugo Van hamme
  • Walter Daelemans
  • Dirk Van Compernolle
چکیده

This paper describes some important modifications to the Celex morphological database in the context of the FLaVoR project. FLaVoR aims to develop a novel modular framework for speech recognition, enabling the integration of complex linguistic knowledge sources, such as a morphological model. Morphology is a fairly unexploited linguistic information source speech recognizers could benefit from. This is especially true for languages which allow for a rich set of morphological operations, such as our target language Dutch. In this paper we focus on the exploitation of the Celex Dutch morphological database as the information source underlying two different morphological analyzers being developed within the project. Although the Celex database provides a valuable source of morphological information for Dutch, many modifications were necessary before it could be practically applied. We identify major problems, discuss the implemented solutions and finally experimentally evaluate the effect of our modifications to the database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A mixed word / morphological approach for extending CELEX for high coverage on contemporary large corpora

This paper describes an alternative approach to morphological language modeling, which incorporates constraints on the morphological production of new words. This is done by applying the constraints as a preprocessing step in which only one morphological production rule can be applied to an extended lexicon of known morphemes, lemmas and word forms. This approach is used to extend the CELEX Dut...

متن کامل

A Comparison of Two Different Approaches to Morphological Analysis of Dutch

This paper compares two systems for computational morphological analysis of Dutch. Both systems have been independently designed as separate modules in the context of the FLaVoR project, which aims to develop a modular architecture for automatic speech recognition. The systems are trained and tested on the same Dutch morphological database (CELEX), and can thus be objectively compared as morpho...

متن کامل

Knowledge-Free Induction of Inflectional Morphologies

We propose an algorithm to automatically induce the morphology of inflectional languages using only text corpora and no human input. Our algorithm combines cues from orthography, semantics, and syntactic distributions to induce morphological relationships in German, Dutch, and English. Using CELEX as a gold standard for evaluation, we show our algorithm to be an improvement over any knowledge-f...

متن کامل

Spelling space: A computational testbed for phonological and morphological changes in Dutch spelling

The Dutch spelling system, like other European spelling systems, represents a certain balance between preserving the spelling of morphemes (the morphological principle) and obeying letter-to-sound regularities (the phonological principle). We present experimental results with artificial learners that show a competition effect between the two principles: adhering more to one principle leads to m...

متن کامل

Refurbishing a Morphological Database for German

The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004