Proximal policy optimization via enhanced exploration efficiency

نویسندگان

چکیده

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning with outstanding performance, especially in continuous control tasks. But the performance of this method still affected by its exploration ability. Based on tasks, paper analyzes original Gaussian action mechanism PPO algorithm, and clarifies influence ability performance. Afterward, aiming at problem exploration, an enhancement based uncertainty estimation designed paper. Then, we apply theory to propose proximal intrinsic module (IEM-PPO). In experimental parts, evaluate our multiple tasks MuJoCo phsysical simulator, compare IEM-PPO curiosity (ICM-PPO). The results demonstrate that performs better terms sample efficiency cumulative reward, has stability robustness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of ...

متن کامل

Enhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning

The prospect of widespread deployment of autonomous vehicles invites the reimagining of the multiagent systems protocols that govern traffic flow in our cities. One such possibility is the introduction of micro-tolling for fine-grained traffic flow optimization. In the micro-tolling paradigm, different toll values are assigned to different links within a congestable traffic network. Self-intere...

متن کامل

Derivative-Free Optimization Via Proximal Point Methods

Derivative-Free Optimization (DFO) examines the challenge of minimizing (or maximizing) a function without explicit use of derivative information. Many standard techniques in DFO are based on using model functions to approximate the objective function, and then applying classic optimization methods on the model function. For example, the details behind adapting steepest descent, conjugate gradi...

متن کامل

Variational Policy Search via Trajectory Optimization

In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present...

متن کامل

Identifying Reusable Macros for Efficient Exploration via Policy Compression

Reinforcement Learning agents often need to solve not a single task, but several tasks pertaining to a same domain; in particular, each task corresponds to an MDP drawn from a family of related MDPs (a domain). An agent learning in this setting should be able exploit policies it has learned in the past, for a given set of sample tasks, in order to more rapidly acquire policies for novel tasks. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2022

ISSN: ['0020-0255', '1872-6291']

DOI: https://doi.org/10.1016/j.ins.2022.07.111