A Bayesian Interpretation of Interpolated Kneser-Ney NUS School of Computing Technical Report TRA2/06
نویسنده
چکیده
Interpolated Kneser-Ney is one of the best smoothing methods for n-gram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated Kneser-Ney as approximate inference in a hierarchical Bayesian model consisting of Pitman-Yor processes. As opposed to past explanations, our interpretation can recover exactly the formulation of interpolated Kneser-Ney, and performs better than interpolated Kneser-Ney when a better inference procedure is used.
منابع مشابه
A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolat...
متن کاملBayesian Language Modelling of German Compounds
In this work we address the challenge of augmenting n-gram language models according to prior linguistic intuitions. We argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In our empirical evaluation the model outperforms a modified Kneser-Ney n-gra...
متن کاملA Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing
The Copyright of this work is owned by the Association for Computational Linguistics (ACL). However, each of the authors and the employers for whom the work was performed reserve all other rights, specifically including the following: ... (4) The right to make copies of the work for internal distribution within the author’s organization and for external distribution as a preprint, reprint, tech...
متن کاملContinuous space language models
This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words....
متن کاملStudy on interaction between entropy pruning and kneser-ney smoothing
The paper presents an in-depth analysis of a less known interaction between Kneser-Ney smoothing and entropy pruning that leads to severe degradation in language model performance under aggressive pruning regimes. Experiments in a data-rich setup such as google.com voice search show a significant impact in WER as well: pruning Kneser-Ney and Katz models to 0.1% of their original impacts speech ...
متن کامل