Kneser-Ney Smoothing With a Correcting Transformation for Small Data Sets
نویسنده
چکیده
We present a technique which improves the Kneser–Ney smoothing algorithm on small data sets for bigrams, and we develop a numerical algorithm which computes the parameters for the heuristic formula with a correction. We give motivation for the formula with correction on a simple example. Using the same example, we show the possible difficulties one may run into with the numerical algorithm. Applying the algorithm to test data we show how the new formula improves the results on cross-entropy.
منابع مشابه
Richer Interpolative Smoothing Based on Modified Kneser-Ney Language Modeling
In this work we present a generalisation of the Modified Kneser-Ney interpolative smoothing for richer smoothing via additional discount parameters. We provide mathematical underpinning for the estimator of the new discount parameters, and showcase the utility of our rich MKN language models on several European languages. We further explore the interdependency among the training data size, lang...
متن کاملStudy on interaction between entropy pruning and kneser-ney smoothing
The paper presents an in-depth analysis of a less known interaction between Kneser-Ney smoothing and entropy pruning that leads to severe degradation in language model performance under aggressive pruning regimes. Experiments in a data-rich setup such as google.com voice search show a significant impact in WER as well: pruning Kneser-Ney and Katz models to 0.1% of their original impacts speech ...
متن کاملImproved Smoothing for N-gram Language Models Based on Ordinary Counts
Kneser-Ney (1995) smoothing and its variants are generally recognized as having the best perplexity of any known method for estimating N-gram language models. Kneser-Ney smoothing, however, requires nonstandard N-gram counts for the lowerorder models used to smooth the highestorder model. For some applications, this makes Kneser-Ney smoothing inappropriate or inconvenient. In this paper, we int...
متن کاملA Bayesian Interpretation of Interpolated Kneser-Ney NUS School of Computing Technical Report TRA2/06
Interpolated Kneser-Ney is one of the best smoothing methods for n-gram language models. Previous explanations for its superiority have been based on intuitive and empirical justifications of specific properties of the method. We propose a novel interpretation of interpolated Kneser-Ney as approximate inference in a hierarchical Bayesian model consisting of Pitman-Yor processes. As opposed to p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Audio, Speech & Language Processing
دوره 15 شماره
صفحات -
تاریخ انتشار 2007