Using the Spoken Dutch Corpus for type-logical grammar induction

نویسندگان

  • Michael Moortgat
  • Richard Moot
چکیده

Abstract The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these frameworks. In this paper we use the computational facilities of the Grail system (see Moot, 2002) to extract type logical grammars from the CGN annotation graphs. Grail is a general grammar development environment for type-logical categorial grammars (TLG). The Grail parsing engine combines proof net technology with structural rewriting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Type-Logical Supertags from the Spoken Dutch Corpus

The Spoken Dutch Corpus assigns 1 million of its 9 million total words a syntactic annotation in the form of dependency graphs. We will look at strategies for automatically extracting a lexicon of type-logical supertags from these dependency graphs and investigate how different levels of lexical detail affect the size of the resulting lexicon as well as the performancewith respect to supertag d...

متن کامل

CLIoS: Cross-Lingual Induction of Speech Recognition Grammars for the Localization of Spoken Dialog Systems

We present an approach for the cross-lingual induction of speech recognition grammars that separates the task of translation from the task of grammar generation. The source speech recognition grammar is used to generate phrases, which are translated by a common translation service. The target recognition grammar is induced by using the production rules of the source language, manually translate...

متن کامل

Clausal Coordinate Ellipsis and its Varieties in Spoken German: A Study with the TüBa-D/S Treebank of the VERBMOBIL Corpus

Grammar rules for Clausal Coordinate Ellipsis (CCE) are based nearly exclusively on linguistic judgments (intuitions). For German, the extent to which grammar rules based on this type of empirical evidence generate all and only CCE structures that populate text corpora, has only been explored with the TIGER treebank of written newspaper text. How well these rules fit spoken German is unknown. I...

متن کامل

Web data harvesting for speech understanding grammar induction

The development of a speech understanding grammar for spoken dialogue systems can be greatly accelerated by using an in-domain corpus. The development of such a corpus, however, is a slow and expensive process. This paper proposes unsupervised, language-agnostic methods for finding relevant corpora in the web and mining the most informative parts. We show that by utilizing perplexity we are abl...

متن کامل

Core Units of Spoken Grammar in Global ELT Textbooks

Materials evaluation studies have constantly demonstrated that there is no one fixed procedure for conducting textbook evaluation studies. Instead, the criteria must be selected according to the needs and objectives of the context in which evaluation takes place. The speaking skill as part of the communicative competence has been emphasized as an important objective in language teaching. The pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002