Deep Generative Stochastic Networks Trainable by Backprop
نویسندگان
چکیده
We introduce a novel training principle for probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution of the Markov chain is conditional on the previous state, generally involving a small move, so this conditional distribution has fewer dominant modes, being unimodal in the limit of small moves. Thus, it is easier to learn because it is easier to approximate its partition function, more like learning to perform supervised function approximation, with gradients that can be obtained by backprop. We provide theorems that generalize recent work on the probabilistic interpretation of denoising autoencoders and obtain along the way an interesting justification for dependency networks and generalized pseudolikelihood, along with a definition of an appropriate joint distribution and sampling mechanism even when the conditionals are not consistent. GSNs can be used with missing inputs and can be used to sample subsets of variables given the rest. We validate these theoretical results with experiments on two image datasets using an architecture that mimics the Deep Boltzmann Machine Gibbs sampler but allows training to proceed with simple backprop, without the need for layerwise pretraining.
منابع مشابه
GSNs : Generative Stochastic Networks
We introduce a novel training principle for generative probabilistic models that is an alternative to maximum likelihood. The proposed Generative Stochastic Networks (GSN) framework generalizes Denoising Auto-Encoders (DAE) and is based on learning the transition operator of a Markov chain whose stationary distribution estimates the data distribution. The transition distribution is a conditiona...
متن کاملSupplemental Material for: Deep Generative Stochastic Networks Trainable by Backprop
Let Pθn(X|X̃) be a denoising auto-encoder that has been trained on n training examples. Pθn(X|X̃) assigns a probability to X , given X̃ , when X̃ ∼ C(X̃|X). This estimator defines a Markov chain Tn obtained by sampling alternatively an X̃ from C(X̃|X) and an X from Pθ(X|X̃). Let πn be the asymptotic distribution of the chain defined by Tn, if it exists. The following theorem is proven by Bengio et al. ...
متن کاملGenerative Adversarial Perturbations
In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models. We present trainable deep neural networks for transforming images to adversarial perturbations. Our proposed models can produce image-agnostic and image-dependent perturbations for targeted and nontargeted at...
متن کاملMax-Margin Deep Generative Models
Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, little work has been done on examining or empowering the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs), which explore the strongly ...
متن کاملGenerative-Discriminative Variational Model for Visual Recognition
The paradigm shift from shallow classifiers with hand-crafted features to endto-end trainable deep learning models has shown significant improvements on supervised learning tasks. Despite the promising power of deep neural networks (DNN), how to alleviate overfitting during training has been a research topic of interest. In this paper, we present a Generative-Discriminative Variational Model (G...
متن کامل