ES Is More Than Just a Traditional Finite-Difference Approximator

نویسندگان

  • Joel Lehman
  • Jay Chen
  • Jeff Clune
  • Kenneth O. Stanley
چکیده

An evolution strategy (ES) variant recently attracted significant attention due to its surprisingly good performance at optimizing neural networks in challenging deep reinforcement learning domains. It searches directly in the parameter space of neural networks by generating perturbations to the current set of parameters, checking their performance, and moving in the direction of higher reward. The resemblance of this algorithm to a traditional finite-difference approximation of the reward gradient in parameter space naturally leads to the assumption that it is just that. However, this assumption is incorrect. The aim of this paper is to definitively demonstrate this point empirically. ES is a gradient approximator, but optimizes for a different gradient than just reward (especially when the magnitude of candidate perturbations is high). Instead, it optimizes for the average reward of the entire population, often also promoting parameters that are robust to perturbation. This difference can channel ES into significantly different areas of the search space than gradient descent in parameter space, and also consequently to networks with significantly different properties. This unique robustness-seeking property, and its consequences for optimization, are demonstrated in several domains. They include humanoid locomotion, where networks from policy gradient-based reinforcement learning are far less robust to parameter perturbation than ES-based policies that solve the same task. While the implications of such robustness and robustnessseeking remain open to further study, the main contribution of this work is to highlight that such differences indeed exist and deserve attention.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Service Life in Concrete Structures based on Diffusion Model in a Marine Environment using Mesh Free, FEM and FDM Approaches

Chloride-induced corrosion is a key factor in the premature corrosion of concrete structures exposed to a marine environment. Fick's second law of diffusion is the dominant equation to model diffusion of chloride ions. This equation is traditionally solved by Finite Element Method (FEM) and Finite Difference Method (FDM). Although these methods are robust and efficient, they may face some numer...

متن کامل

Design of a High-Bandwidth Y-Shaped Photonic Crystal Power Splitter for TE Modes

In this paper, a Y-shaped power splitter based on a two dimensional photonic crystal (PhC) for TE modes is designed and optimized. A triangular lattice of air holes is used for Y-shaped power divider. For analyzing these structures, plane wave expansion (PWE) and finite difference time domain (FDTD) methods are used. The simulation results show that more than 98% of the input power is transmitt...

متن کامل

An Analysis of Temporal-Difference Learning with Function Approximation

We discuss the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of an infinite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator online during a single endless trajectory of an irreducible aperiodic Markov chain with a finite or infinite state space. We present a proof of convergence (with pr...

متن کامل

Solving a system of 2D Burgers' equations using Semi-Lagrangian finite difference schemes

In this paper, we aim to generalize semi-Lagrangian finite difference schemes for a system of two-dimensional (2D) Burgers' equations. Our scheme is not limited by the Courant-Friedrichs-Lewy (CFL) condition and therefore we can apply larger step size for the time variable. Proposed schemes can be implemented in parallel very well and in fact, it is a local one-dimensional (LOD) scheme which o...

متن کامل

Effects of geometrical and geomechanical properties on slope stability of open-pit mines using 2D and 3D finite difference methods

Slope stability analysis is one of the most important problems in mining and geotechnical engineering. Ignoring the importance of these problems can lead to significant losses. Selecting an appropriate method to analyze the slope stability requires a proper understanding of how different factors influence the outputs of the analyses. This paper evaluates the effects of considering the real geom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.06568  شماره 

صفحات  -

تاریخ انتشار 2017