DiCE: The Infinitely Differentiable Monte-Carlo Estimator

نویسندگان

  • Jakob N. Foerster
  • Gregory Farquhar
  • Maruan Al-Shedivat
  • Tim Rocktäschel
  • Eric P. Xing
  • Shimon Whiteson
چکیده

The score function estimator is widely used for estimating gradients of stochastic objectives in Stochastic Computation Graphs (SCG), e.g., in reinforcement learning and meta-learning. While deriving the first order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher order gradients is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order gradient involves increasingly cumbersome graph manipulations. Lastly, to match the first order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for higher order gradient estimators. To address all these shortcomings in a unified way, we introduce DICE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DICE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DICE both through a proof and through numerical evaluation of the DICE gradient estimates. We also use DICE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://goo.gl/xkkGxN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Positive-Shrinkage and Pretest Estimation in Multiple Regression: A Monte Carlo Study with Applications

Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is a priori known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage meth...

متن کامل

A New Ridge Estimator in Linear Measurement Error Model with Stochastic Linear Restrictions

In this paper, we propose a new ridge-type estimator called the new mixed ridge estimator (NMRE) by unifying the sample and prior information in linear measurement error model with additional stochastic linear restrictions. The new estimator is a generalization of the mixed estimator (ME) and ridge estimator (RE). The performances of this new estimator and mixed ridge estimator (MRE) against th...

متن کامل

Part pose statistics: estimators and experiments

Many of the most fundamental examples in probability involve the pose statistics of coins and dice as they are dropped on a flat surface. For these parts, the probability assigned to each stable face is justified based on part symmetry, although most gamblers are familiar with the possibility of loaded dice. In industrial part feeding, parts also arrive in random orientations. We consider the f...

متن کامل

A CLT for Infinitely Stratified Estimators, with Applications to Debiased MLMC

This paper develops a general central limit theorem (CLT) for post-stratified Monte Carlo estimators with an associated infinite number of strata. In addition, consistency of the corresponding variance estimator is established in the same setting. With these results in hand, one can then construct asymptotically valid confidence interval procedures for such infinitely stratified estimators. We ...

متن کامل

Nonparametric Estimation of an Additive Quantile Regression Model

This paper is concerned with estimating the additive components of a nonparametric additive quantile regression model. We develop an estimator that is asymptotically normally distributed with a rate of convergence in probability of n−r/(2r+1) when the additive components are r-times continuously differentiable for some r ≥ 2. This result holds regardless of the dimension of the covariates and, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.05098  شماره 

صفحات  -

تاریخ انتشار 2018