Estimating entropy of a language from optimal word insertion penalty
نویسندگان
چکیده
The relationship between the optimal value of word insertion penalty and entropy of the language is discussed, based on the hypothesis that the optimal word insertion penalty compensates the probability given by a language model to the true probability. It is shown that the optimal word insertion penalty can be calculated as the difference between test set entropy of the given language model and true entropy of the given test set sentences. The correctness of the idea is confirmed through recognition experiment, where the entropy of the given set of sentences are estimated from two different language models and word insertion penalty optimized for each language model.
منابع مشابه
Balancing acoustic and linguistic probabilities
The length of the word sequence is not taken into account under language modeling of n-gram local probability modeling. Due to this property, the optimal values of the language weight and word insertion penalty for balancing acoustic and linguistic probabilities is affected by the length of word sequence. To deal with this problem, a new language model is developed based on Bernoulli trial mode...
متن کاملWhole-sentence exponential language models: a vehicle for linguistic-statistical integration
We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a “bag of features”, where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional expon...
متن کاملEstimating the potential of signal and interlocutor-track information for language modeling
Although today most language models treat language purely as word sequences, there is recurring interest in tapping new sources of information, such as disfluencies, prosody, the interlocutor’s dialog act, and the interlocutor’s recent words. In order to estimate the potential value of such sources of information, we extend Shannon’s guessing-game method for estimating entropy to work for spoke...
متن کاملAn Efficient Keyword Spotting Techni Language for Filler Mo
The task of keyword spotting is to detect a set of keywords in the input continuous speech. In a keyword spotter, not only the keywords, but also the non-keyword intervals must be modeled. For this purpose, filler (or garbage) models are used. To date, most of the keyword spotters have been based on hidden Markov models (HMM). More specifically, a set of HMM is used as garbage models. In this p...
متن کاملRussian Loanword Adaptation in Persian; Optimal Approach
In this paper we analyzed some of the phonological rules of Russian loanword adaptation in Persian, on the view of Optimal Theory (OT) (Prince and Smolensky, 1993, 2003). It is the first study of phonological process on Russian loanwords adaptation in Persian. By gathering about 50 current Russian loanwords, we selected some of them to analyze. We found out that vowel insertion, vowel prothesis...
متن کامل