Kullback-Leibler Distance between Probabilistic Context-Free Grammars and Probabilistic Finite Automata

نویسندگان

  • Mark-Jan Nederhof
  • Giorgio Satta
چکیده

We consider the problem of computing the Kullback-Leibler distance, also called the relative entropy, between a probabilistic context-free grammar and a probabilistic finite automaton. We show that there is a closed-form (analytical) solution for one part of the Kullback-Leibler distance, viz. the cross-entropy. We discuss several applications of the result to the problem of distributional approximation of probabilistic context-free grammars by means of probabilistic finite automata.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing The Kullback-Leibler Divergence Between Probabilistic Automata Using Rational Kernels

Kullback-Leibler divergence is a natural distance measure between two probabilistic finite-state automata. Computing this distance is difficult, since it requires a summation over a countably infinite number of strings. Nederhof and Satta (2004) recently provided a solution in the course of solving the more general problem of finding the cross-entropy between a probabilistic context-free gramma...

متن کامل

On the Computation of Distances for Probabilistic Context-Free Grammars

Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through che...

متن کامل

A General Technique to Train Language Models on Language Models

We show that under certain conditions, a language model can be trained on the basis of a second language model. The main instance of the technique trains a finite automaton on the basis of a probabilistic context-free grammar, such that the Kullback-Leibler distance between grammar and trained automaton is provably minimal. This is a substantial generalization of an existing algorithm to train ...

متن کامل

Finding the Most Probable String and the Consensus String: an Algorithmic Study

The problem of finding the most probable string for a distribution generated by a weighted finite automaton or a probabilistic grammar is related to a number of important questions: computing the distance between two distributions or finding the best translation (the most probable one) given a probabilistic finite state transducer. The problem is undecidable with general weights and is NP-hard ...

متن کامل

The most probable string: an algorithmic study

The problem of finding the consensus (most probable string) for a distribution generated by a weighted finite automaton or a probabilistic grammar is related to a number of important questions: computing the distance between two distributions or finding the best translation (the most probable one) given a probabilistic finite state transducer. The problem is undecidable with general weights and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004