Prefix Probabilities from Stochastic Tree Adjoining Grammars
نویسندگان
چکیده
Language models for speech recognition typically use a probability model of the form Pr(an|a1, a2, . . . , an−1). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability ∑ w∈Σ Pr(a1 · · · anw), where w represents all possible terminations of the prefix a1 · · · an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n6) time. The probability of subderivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.
منابع مشابه
Prefix probabilities for linear indexed grammars
vVe show how prefix probabilities can be computed for stochastic linear indexed grammars (SLIGs). Our results apply as weil to stochastic tree-adjoining grammars (STAGs), due to their equivalence to SLIGs.
متن کاملPrefix Probabilities for Linear Context-Free Rewriting Systems
We present a novel method for the computation of prefix probabilities for linear context-free rewriting systems. Our approach streamlines previous procedures to compute prefix probabilities for context-free grammars, synchronous context-free grammars and tree adjoining grammars. In addition, the methodology is general enough to be used for a wider range of problems involving, for example, sever...
متن کاملPreex Probabilities for Linear Indexed Grammars
We show how preex probabilities can be computed for stochastic linear indexed grammars (SLIGs). Our results apply as well to stochastic tree-adjoining grammars (STAGs), due to their equivalence to SLIGs.
متن کاملPre x Probabilities from Stochastic Tree Adjoining Grammars
Language models for speech recognition typically use a probability model of the form Pr(anja1; a2; : : : ; an 1). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the pre x probability P w2 Pr(a1 anw), where w represents all possible terminations of the pre x a1 an. The...
متن کاملProceedings of the 9 th International Workshop Finite State Methods and Natural Language Processing
The paradigm of parsing as intersection has been used throughout the literature to obtain elegant and general solutions to numerous problems involving grammars and automata. The paradigm has its origins in (Bar-Hillel et al., 1964), where a general construction was used to prove closure of context-free languages under intersection with regular languages. It was pointed out by (Lang, 1994) that ...
متن کامل