PCFG Extraction and Pre-typed Sentence Analysis
نویسنده
چکیده
We explain how we extracted a PCFG (probabilistic contextfree grammar) from the Paris VII treebank. First we transform the syntactic trees of the corpus in derivation trees. The transformation is done with a generalized tree transducer, a variation from the usual top-down tree transducers, and gives as result some derivation trees for an AB grammar, which is a subset of a Lambek grammar, containing only the left and right elimination rules. We then have to extract a PCFG from the derivation tree. For this, we assume that the derivation trees are representative of the grammar. The extracted grammar is used, through a slightly modified CYK algorithm that takes in account the probabilities, for sentences analysis. It enables us to know if a sentence is include in the language described by the grammar.
منابع مشابه
Extraction de PCFG et analyse de phrases pré-typées (PCFG Extraction and Pre-typed Sentences Analysis) [in French]
PCFG Extraction and Pre-typed Sentences Analysis This article explains the way we extract a PCFG from the Paris VII treebank. Firslty, we need to transform the syntactic trees of the corpus into derivation trees. The transformation is done with a generalized tree transducer, a variation of the usual top-down tree transducers, and gives as result some derivation trees for an AB grammar. Secondel...
متن کاملUnsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
“Grounded” language learning employs training data in the form of sentences paired with relevant but ambiguous perceptual contexts. Börschinger et al. (2011) introduced an approach to grounded language learning based on unsupervised PCFG induction. Their approach works well when each sentence potentially refers to one of a small set of possible meanings, such as in the sportscasting task. Howev...
متن کاملComparing the Ambiguity Reduction Abilities of Probabilistic Context-Free Grammars
We present a measure for evaluating Probabilistic Context Free Grammars (PCFG) based on their ambiguity resolution capabilities. Probabilities in a PCFG can be seen as a filtering mechanism: For an ambiguous sentence, the trees bearing maximum probability are single out, while all others are discarded. The level of ambiguity is related to the size of the singled out set of trees. Under our meas...
متن کاملParsing with Context - Free Grammars and WordStatistics
We present a language model in which the probability of a sentence is the sum of the individual parse probabilities, and these are calculated using a probabilistic context-free grammar (PCFG) plus statistics on individual words and how they t into parses. We have used the model to improve syntactic disambiguation. After training on Wall Street Journal (WSJ) text we tested on about 200 WSJ sente...
متن کاملAccurate and Robust LFG-Based Generation for Chinese
We describe three PCFG-based models for Chinese sentence realisation from LexicalFunctional Grammar (LFG) f-structures. Both the lexicalised model and the history-based model improve on the accuracy of a simple wide-coverage PCFG model by adding lexical and contextual information to weaken inappropriate independence assumptions implicit in the PCFG models. In addition, we provide techniques for...
متن کامل