Fourier Policy Gradients
نویسندگان
چکیده
We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
منابع مشابه
Optimizing Schrödinger Functionals Using Sobolev Gradients: Applications to Quantum Mechanics and Nonlinear Optics
In this paper we study the application of the Sobolev gradients technique to the problem of minimizing several Schrödinger functionals related to timely and difficult nonlinear problems in Quantum Mechanics and Nonlinear Optics. We show that these gradients act as preconditioners over traditional choices of descent directions in minimization methods and show a computationally inexpensive way to...
متن کاملRevisiting stochastic off-policy action-value gradients
Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural ...
متن کاملPolicy gradients in linearly-solvable MDPs
We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy g...
متن کاملAppendix A: Wavefront Reconstruction Based on Fourier Series
Fourier Wavefront Reconstruction Over a Rectangular Pupil The iterative and boundary methods developed here for circular and elliptical pupils are based on a method to reconstruct the wavefront over the rectangular pupil. To fully understand this development, it is necessary to first summarize the approach used by Freischlad and Koliopoulos to derive an inverse spatial filter to reconstruct a w...
متن کاملAdaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.06891 شماره
صفحات -
تاریخ انتشار 2018