Fitting maximum-entropy models on large sample spaces

نویسندگان

  • Edward Schofield
  • Gernot Kubin
  • David Thornley
  • Tomas Nordström
  • Andreas Türk
چکیده

This thesis investigates the iterative application of Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It describes a suite of tools for applying such models to large domains in which exact computation is not practically possible. The first result is a derivation of estimators for the Lagrange dual of the entropy and its gradient using importance sampling from a measure on the same probability space or its image under the transformation induced by the canonical sufficient statistic. This yields two benefits. One is the flexibility to choose an auxiliary distribution for sampling that reduces the standard error of the estimates for a given sample size. The other is the opportunity to re-weight a fixed sample iteratively, which can cut the computational burden for each iteration. The second result is the derivation of matrix–vector expressions for these estimators. Importance-sampling estimates of the entropy dual and its gradient can be computed efficiently from a fixed sample; the computation is dominated by two matrix–vector products involving the same matrix of sample statistics. The third result is an experimental study of the application of these estimators to the problem of estimating whole-sentence language models. The use of importance sampling in conjunction with sample-path optimization is feasible whenever the auxiliary distribution does not too severely under-represent any linguistic features under constraint. Parameter estimation is rapid, requiring a few minutes with a 2006-vintage computer to fit models under hundreds of thousands of constraints. The procedure is most effective when used to minimize divergence (relative entropy) from existing baseline models, such as n-grams estimated by traditional means, rather than to maximize entropy under constraints on the probabilities of rare n-grams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast parameter estimation for joint maximum entropy language models

This paper discusses efficient parameter estimation methods for joint (unconditional) maximum entropy language models such as whole-sentence models. Such models are a sound framework for formalizing arbitrary linguistic knowledge in a consistent manner. It has been shown that general-purpose gradient-based optimization methods are among the most efficient algorithms for estimating parameters of...

متن کامل

A Note on the Bivariate Maximum Entropy Modeling

Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1  and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...

متن کامل

Spatio-temporal spike trains analysis for large scale networks using maximum entropy principle and Monte-Carlo method

Understanding the dynamics of neural networks is a major challenge in experimental neuroscience. For that purpose, a modelling of the recorded activity that reproduces the main statistics of the data is required. In a first part, we present a review on recent results dealing with spike train statistics analysis using maximum entropy models (MaxEnt). Most of these studies have been focusing on m...

متن کامل

Structure of the velocity distribution of the Galactic disc. A maximum entropy statistical approach - Part I

The maximum entropy approach is proposed to describe the local structures of the velocity distribution, which are collected through its sample moments. The method is used with several samples from the HIPPARCOS and Geneva-Copenhagen survey catalogues. For the large-scale distribution, the phase density function may be obtained by fitting moments up to sixth order as a product of two exponential...

متن کامل

Efficiency Bound of Local Z-Estimators on Discrete Sample Spaces

Many statistical models over a discrete sample space often face the computational difficulty of the normalization constant. Because of that, the maximum likelihood estimator does not work. In order to circumvent the computation difficulty, alternative estimators such as pseudo-likelihood and composite likelihood that require only a local computation over the sample space have been proposed. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006