Evaluating a German Sketch Grammar: A Case Study on Noun Phrase Case
نویسندگان
چکیده
Word sketches are part of the Sketch Engine corpus query system. They represent automatic, corpus-derived summaries of the words’ grammatical and collocational behaviour. Besides the corpus itself, word sketches require a sketch grammar, a regular expression-based shallow grammar over the part-of-speech tags, to extract evidence for the properties of the targeted words from the corpus. The paper presents a sketch grammar for German, a language which is not strictly configurational and which shows a considerable amount of case syncretism, and evaluates its accuracy, which has not been done for other sketch grammars. The evaluation focuses on NP case as a crucial part of the German grammar. We present various versions of NP definitions, so demonstrating the influence of grammar detail on precision and recall.
منابع مشابه
An HPSG-Analysis for Free Relative Clauses in German
At the moment there is no theory for free relative clauses in German in the framework of Head-driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994). From GB literature on the subject it is known that free relative clauses behave partly like noun phrases. They can fill argument positions of verbs. And although they are finite sentences, they are serialized like noun phrases in the Germa...
متن کاملIncremental Identification of Inflectional Types
We present an approach to the incremental accrual of lexical information for unknown words that is constraint-based and compatible with standard unification-based grammars. Although the techniques are language-independent and can be applied to all kinds of information, in this paper we concentrate on the domain of German noun inflection. We show how morphological information, especially inflect...
متن کاملA CHAT-Based Annotation Scheme for Case and Noun-Phrase Inflection in Child Language Data
This paper describes a coding scheme and a set of semi-automatic procedures for the annotation of complex noun phrases and their morpho-syntactic properties in child language data. These tools are based on the CHAT conventions of the Child Language Data Exchange System (MacWhinney 2000; CHILDES: http://childes.psy.cmu.edu/; CHAT: http://childes.psy.cmu.edu/manuals/chat.pdf). The coding scheme p...
متن کاملEvaluating and Extending the Coverage of HPSG Grammars: A Case Study for German
In this work, we examine and attempt to extend the coverage of a German HPSG grammar. We use the grammar to parse a corpus of newspaper text and evaluate the proportion of sentences which have a correct attested parse, and analyse the cause of errors in terms of lexical or constructional gaps which prevent parsing. Then, using a maximum entropy model, we evaluate prediction of lexical types in ...
متن کاملTowards a corpus-based dictionary of German noun-verb collocations
We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collocations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation nding tools do not provide other than lexic...
متن کامل