Fitting maximum-entropy models on large sample spaces
نویسندگان
چکیده
This thesis investigates the iterative application of Monte Carlo methods to the problem of parameter estimation for models of maximum entropy, minimum divergence, and maximum likelihood among the class of exponential-family densities. It describes a suite of tools for applying such models to large domains in which exact computation is not practically possible. The first result is a derivation of estimators for the Lagrange dual of the entropy and its gradient using importance sampling from a measure on the same probability space or its image under the transformation induced by the canonical sufficient statistic. This yields two benefits. One is the flexibility to choose an auxiliary distribution for sampling that reduces the standard error of the estimates for a given sample size. The other is the opportunity to re-weight a fixed sample iteratively, which can cut the computational burden for each iteration. The second result is the derivation of matrix–vector expressions for these estimators. Importance-sampling estimates of the entropy dual and its gradient can be computed efficiently from a fixed sample; the computation is dominated by two matrix–vector products involving the same matrix of sample statistics. The third result is an experimental study of the application of these estimators to the problem of estimating whole-sentence language models. The use of importance sampling in conjunction with sample-path optimization is feasible whenever the auxiliary distribution does not too severely under-represent any linguistic features under constraint. Parameter estimation is rapid, requiring a few minutes with a 2006-vintage computer to fit models under hundreds of thousands of constraints. The procedure is most effective when used to minimize divergence (relative entropy) from existing baseline models, such as n-grams estimated by traditional means, rather than to maximize entropy under constraints on the probabilities of rare n-grams.
منابع مشابه
Fast parameter estimation for joint maximum entropy language models
This paper discusses efficient parameter estimation methods for joint (unconditional) maximum entropy language models such as whole-sentence models. Such models are a sound framework for formalizing arbitrary linguistic knowledge in a consistent manner. It has been shown that general-purpose gradient-based optimization methods are among the most efficient algorithms for estimating parameters of...
متن کاملA Note on the Bivariate Maximum Entropy Modeling
Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1 and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...
متن کاملSpatio-temporal spike trains analysis for large scale networks using maximum entropy principle and Monte-Carlo method
Understanding the dynamics of neural networks is a major challenge in experimental neuroscience. For that purpose, a modelling of the recorded activity that reproduces the main statistics of the data is required. In a first part, we present a review on recent results dealing with spike train statistics analysis using maximum entropy models (MaxEnt). Most of these studies have been focusing on m...
متن کاملStructure of the velocity distribution of the Galactic disc. A maximum entropy statistical approach - Part I
The maximum entropy approach is proposed to describe the local structures of the velocity distribution, which are collected through its sample moments. The method is used with several samples from the HIPPARCOS and Geneva-Copenhagen survey catalogues. For the large-scale distribution, the phase density function may be obtained by fitting moments up to sixth order as a product of two exponential...
متن کاملEfficiency Bound of Local Z-Estimators on Discrete Sample Spaces
Many statistical models over a discrete sample space often face the computational difficulty of the normalization constant. Because of that, the maximum likelihood estimator does not work. In order to circumvent the computation difficulty, alternative estimators such as pseudo-likelihood and composite likelihood that require only a local computation over the sample space have been proposed. In ...
متن کامل