Behavior Policy Gradient Supplemental Material

نویسندگان

  • Josiah P. Hanna
  • Philip S. Thomas
  • Peter Stone
  • Scott Niekum
چکیده

A. Proof of Theorem 1 In Appendix A, we give the full derivation of our primary theoretical contribution — the importance-sampling (IS) variance gradient. We also present the variance gradient for the doubly-robust (DR) estimator. We first derive an analytic expression for the gradient of the variance of an arbitrary, unbiased off-policy policy evaluation estimator, OPE(H,θ). Importance-sampling is one such off-policy policy evaluation estimator. From our general derivation we derive the gradient of the variance of the IS estimator and then extend to the DR estimator.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying the Effect of Changing the Policy Threshold in Regression Discontinuity Models - Supplemental Appendix

This is a Supplemental online Appendix, containing additional theoretical and empirical results. 1 Supplemental Online Appendix Here we provide additional supplemental material. First is some details regarding extensions to higher order derivatives and larger than marginal changes in the threshold. Next is a second empirical application, showing application of our methods in a fuzzy design cont...

متن کامل

Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals

Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. We make a further contribution by evaluating code sharing policies, supplemental materials policies,...

متن کامل

Fluidity Onset Analysis in FG Thick-Walled Spherical Tanks under Concurrent Pressure Loading and Heat Gradient

In this paper,fluidity onset analysis in FG thick-walled spherical tanks under concurrent pressure loading and heat gradient has been presented. Designing thick-walled spherical tanks under pressure as tanks holding fluids under heat loads with high heat gradients require new approaches. Under high internal pressure and high temperature, the tank enters the plastic stage in a part of its thickn...

متن کامل

Non-linear Thermo-mechanical Bending Behavior of Thin and Moderately Thick Functionally Graded Sector Plates Using Dynamic Relaxation Method

In this study, nonlinear bending of solid and annular functionally graded (FG) sector plates subjected to transverse mechanical loading and thermal gradient along the thickness direction is investigated. Material properties are varied continuously along the plate thickness according to power-law distribution of the volume fraction of the constituents. According to von-Karman relation for large ...

متن کامل

Learning Rotation-Aware Features: From Invariant Priors to Equivariant Descriptors Supplemental Material

The R-FoE model of Sec. 3 of the main paper was trained on a database of 5000 natural images (50 × 50 pixels) using persistent contrastive divergence [12] (also known as stochastic maximum likelihood). Learning was done with stochastic gradient descent using mini-batches of 100 images (and model samples) for a total of 10000 (exponentially smoothed) gradient steps with an annealed learning rate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017