Estimating entropy of a language from optimal word insertion penalty

نویسندگان

  • Kazuya Takeda
  • Atsunori Ogawa
  • Fumitada Itakura
چکیده

The relationship between the optimal value of word insertion penalty and entropy of the language is discussed, based on the hypothesis that the optimal word insertion penalty compensates the probability given by a language model to the true probability. It is shown that the optimal word insertion penalty can be calculated as the difference between test set entropy of the given language model and true entropy of the given test set sentences. The correctness of the idea is confirmed through recognition experiment, where the entropy of the given set of sentences are estimated from two different language models and word insertion penalty optimized for each language model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Balancing acoustic and linguistic probabilities

The length of the word sequence is not taken into account under language modeling of n-gram local probability modeling. Due to this property, the optimal values of the language weight and word insertion penalty for balancing acoustic and linguistic probabilities is affected by the length of word sequence. To deal with this problem, a new language model is developed based on Bernoulli trial mode...

متن کامل

Whole-sentence exponential language models: a vehicle for linguistic-statistical integration

We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a “bag of features”, where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional expon...

متن کامل

Estimating the potential of signal and interlocutor-track information for language modeling

Although today most language models treat language purely as word sequences, there is recurring interest in tapping new sources of information, such as disfluencies, prosody, the interlocutor’s dialog act, and the interlocutor’s recent words. In order to estimate the potential value of such sources of information, we extend Shannon’s guessing-game method for estimating entropy to work for spoke...

متن کامل

An Efficient Keyword Spotting Techni Language for Filler Mo

The task of keyword spotting is to detect a set of keywords in the input continuous speech. In a keyword spotter, not only the keywords, but also the non-keyword intervals must be modeled. For this purpose, filler (or garbage) models are used. To date, most of the keyword spotters have been based on hidden Markov models (HMM). More specifically, a set of HMM is used as garbage models. In this p...

متن کامل

Russian Loanword Adaptation in Persian; Optimal Approach

In this paper we analyzed some of the phonological rules of Russian loanword adaptation in Persian, on the view of Optimal Theory (OT) (Prince and Smolensky, 1993, 2003). It is the first study of phonological process on Russian loanwords adaptation in Persian. By gathering about 50 current Russian loanwords, we selected some of them to analyze. We found out that vowel insertion, vowel prothesis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998