Polynomial Tree Substitution Grammars: an efficient framework for Data-Oriented Parsing

نویسندگان

  • Jean-Cédric Chappelier
  • Martin Rajman
چکیده

Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima’an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complexity. The purpose of this paper is to present such an instance, based on the minimal-maximal selection principle, and to evaluate its performances on two different corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Compact but Accurate Tree-Substitution Grammars

Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our...

متن کامل

Eecient Disambiguation by Means of Stochastic Tree Substitution Grammars

In Stochastic Tree Substitution Grammars (STSGs), one parse(tree) of an input sentence can be generated by exponentially many derivations ; the probability of a parse is deened as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for nding the most probable parse (MPP) of an input sentence, are not ...

متن کامل

Parsing with the Shortest Derivation

Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations of a sentence is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based o...

متن کامل

Parsing preferences with Lexicalized Trey Adjoining Grammars exploiting the derivation tree

Since Kimball (73) parsing preference principles such as "Right association" (RA) and "Minimal attachment" (MA) are often formulated with respect to constituent trees. We present 3 preference principles based on "derivation trees" within the framework of LTAGs. We argue they remedy some shortcomings of the former approaches and account for widely accepted heuristics (e.g. argument/modifier, idi...

متن کامل

Tabulation of Automata for Tree-Adjoining Languages

We propose a modular design of tabular parsing algorithms for treeadjoining languages. The modularity is made possible by a separation of the parsing strategy from the mechanism of tabulation. The parsing strategy is expressed in terms of the construction of a nondeterministic automaton from a grammar; three distinct types of automaton will be discussed. The mechanism of tabulation leads to the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001