M-A3C: A Mean-Asynchronous Advantage Actor-Critic Reinforcement Learning Method for Real-Time Gait Planning of Biped Robot
نویسندگان
چکیده
Bipedal walking is a challenging task for humanoid robots. In this study, we develop lightweight reinforcement learning method real-time gait planning of the biped robot. We regard bipedal as process in which robot constantly interacts with environment, judges quality control action through state, and then adjusts strategy. A mean-asynchronous advantage actor-critic (M-A3C) algorithm proposed to obtain continuous state space space, directly final without introducing reference gait. use multiple sub-agents M-A3C train virtual robots independently at same time physical simulation platform. Then transfer trained model actual reduce number training on robot, improve speed, ensure acquisition Finally, designed fabricated verify effectiveness method. Various experiments show that can achieve robot’s stable planning.
منابع مشابه
Reinforcement Learning for Biped Robot
Animal rhythmic movements such as locomotion are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by such a biological mechanisms, rhythmic movements controlled by CPG has been studied. As an autonomous learning framework for the CPG controller, we propose an reinforcement learning method , which is called the...
متن کاملStable Gait Planning and Robustness Analysis of a Biped Robot with One Degree of Underactuation
In this paper, stability analysis of walking gaits and robustness analysis are developed for a five-link and four-actuator biped robot. Stability conditions are derived by studying unactuated dynamics and using the Poincaré map associated with periodic walking gaits. A stable gait is designed by an optimization process satisfying physical constraints and stability conditions. Also, considering...
متن کاملDynamic Control with Actor-Critic Reinforcement Learning
4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...
متن کامل1 Supervised Actor - Critic Reinforcement Learning
Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...
متن کاملMean Actor Critic
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3176608