Maximum Reconstruction Estimation for Generative Latent-Variable Models

نویسندگان

  • Yong Cheng
  • Yang Liu
  • Wei Xu
چکیده

Generative latent-variable models are important for natural language processing due to their capability of providing compact representations of data. As conventional maximum likelihood estimation (MLE) is prone to focus on explaining irrelevant but common correlations in data, we apply maximum reconstruction estimation (MRE) to learning generative latent-variable models alternatively, which aims to find model parameters that maximize the probability of reconstructing the observed data. We develop tractable algorithms to directly learn hidden Markov models and IBM translation models using the MRE criterion, without the need to introduce a separate reconstruction model to facilitate efficient inference. Experiments on unsupervised part-of-speech induction and unsupervised word alignment show that our approach enables generative latent-variable models to better discover intended correlations in data and outperforms maximum likelihood estimators significantly. Introduction The need to learn latent structures from unlabeled data arises in many different problems in natural language processing (NLP), including part-of-speech (POS) induction (Merialdo 1994; Johnson 2007), word alignment (Brown et al. 1993; Vogel, Ney, and Tillmann 1996), syntactic parsing (Klein and Manning 2004; Smith and Eisner 2005), and semantic parsing (Poon and Domingos 2009). Generative latentvariable models such as hidden Markov models (HMMs) and latent-variable probabilistic context-free grammars (LPCFGs) have been widely used for unsupervised structured prediction due to their capability of providing compact representations of data. Maximum likelihood estimation (MLE) is the standard criterion for training generative latent-variable models: maximizing the likelihood of observed data by marginalizing over latent variables, typically via the Expectation Maximization (EM) algorithm. However, recent studies have revealed that MLE suffers from a significant problem: it may guide the model to focus on explaining irrelevant but common correlations in the data (Ganchev et al. 2010). For ex∗ Yang Liu is the corresponding author: liuyang2011@ tsinghua. edu.cn. Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. ample, in unsupervised POS induction, estimating tag probabilities are often prone to be disrupted by high-frequency words such as “,” and “is” (see Table 3). In this work, we introduce maximum reconstruction estimation (MRE) (Hinton, Osindero, and Teh 2006; Vincent et al. 2008; Bengio 2009; Vincent et al. 2010; Socher et al. 2011; Ammar, Dyer, and Smith 2014; Alain et al. 2015) for learning generative latent-variable models. The basic idea is to circumvent irrelevant but common correlations by maximizing the probability of reconstructing observed data. The new objective is based on an encoding-reconstruction framework: first generating a latent structure conditioned on the observed data (encoding) and then re-generating the observation based on the latent structure (reconstruction). Our approach has the following advantages: 1. Direct learning of model parameters. As we are only interested in learning the parameters of data distribution (i.e., P (x;θ)), it is challenging to define the reconstruction distribution (i.e., P (x|x;θ)) using data distribution parameters (i.e., θ) due to the intricate dependencies between sub-structures in generative latent-variable models. While previous work has to maintain separate sets of model parameters for encoding and reconstruction to facilitate efficient inference (Ammar, Dyer, and Smith 2014), our approach is capable of directly learning intended model parameters. 2. Tractable inference. Our approach keeps the same tractability of learning generative latent-variable models as its MLE counterpart. In this work, we apply MRE to learning two classic generative latent-variable models (i.e., HMMs and IBM translation models) and design efficient dynamic programming algorithms to calculate expectations exactly. We evaluate our approach on unsupervised POS induction and unsupervised word alignment. Experiments show that MRE enables generative latent-variable models to better discover intended correlations in data and thus leads to significant improvements over MLE. Maximum Reconstruction Estimation Let x be an observation, z be a latent structure, and P (x;θ) be a generative latent-variable model parameterized by θ: P (x;θ) = ∑

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accuracy of latent-variable estimation in Bayesian semi-supervised learning

Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable on...

متن کامل

Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization

We propose a new unsupervised sentence salience framework for Multi-Document Summarization (MDS), which can be divided into two components: latent semantic modeling and salience estimation. For latent semantic modeling, a neural generative model called Variational Auto-Encoders (VAEs) is employed to describe the observed sentences and the corresponding latent semantic representations. Neural va...

متن کامل

Function Optimization with Latent Variable Models

Most of estimation of distribution algorithms (EDAs) try to represent explicitly the relationship between variables with factorization techniques or with graphical models such as Bayesian networks. In this paper, we propose to use latent variable models such as Helmholtz machine and probabilistic principal component analysis for capturing the probabilistic distribution of given data. The latent...

متن کامل

Lecture 20: Spectral Methods

For simplicity we will focus on a simple Gaussian Mixture Model. Consider a mixture of k spherical gaussians in R which is the following generative process. Let w ∈ ∆([k]) denote a distribution and let μ1, . . . , μk ∈ R be the mean vectors. Each point xi is generated by first choosing a component hi ∼ w and then xi ∼ N (μhi , I). We are given n samples x1, . . . , xn drawn according to this pr...

متن کامل

Attribute2Image: Conditional Image Generation from Visual Attributes

This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017