Regularly updated deterministic policy gradient algorithm
نویسندگان
چکیده
Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method inefficient and unstable in practical applications. On other hand, bias variance Q estimation target function are sometimes difficult to control. This paper proposes a Regularly Updated (RUD) policy gradient for these problems. theoretically proves that procedure with RUD can make better use new data replay buffer than traditional procedure. In addition, low value more suitable current Clipped Double Q-learning strategy. has designed comparison experiment against previous methods, an ablation original DDPG, analytical experiments Mujoco environments. The experimental results demonstrate effectiveness superiority RUD.
منابع مشابه
Deterministic Policy Gradient Algorithms
In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensu...
متن کاملDeterministic Policy Gradient Algorithms: Supplementary Material
A. Regularity Conditions Within the text we have referred to regularity conditions on the MDP: Regularity conditions A.1: p(s′|s, a), ∇ap(s|s, a), μθ(s), ∇θμθ(s), r(s, a), ∇ar(s, a), p1(s) are continuous in all parameters and variables s, a, s′ and x. Regularity conditions A.2: there exists a b and L such that sups p1(s) < b, supa,s,s′ p(s′|s, a) < b, supa,s r(s, a) < b, supa,s,s′ ||∇ap(s|s, a)...
متن کاملDeep Deterministic Policy Gradient for Urban Traffic Light Control
Traffic light timing optimization is still an active line of research despite the wealth of scientific literature on the topic, and the problem remains unsolved for any non-toy scenario. One of the key issues with traffic light optimization is the large scale of the input information that is available for the controlling agent, namely all the traffic data that is continually sampled by the traf...
متن کاملA proposal for regularly updated review/survey articles: "Perpetual Reviews"
: We advocate the publication of review/survey articles that will be updated regularly, both in traditional journals and novel venues. We call these " perpetual reviews. " This idea naturally builds on the dissemination and archival capabilities present in the modern internet, and indeed perpetual reviews exist already in some forms. Perpetual review articles allow authors to maintain over ti...
متن کاملAn Algorithm Based Methodology for the Creation of a Regularly Updated Global Online Map Derived From Volunteered Geographic Information
Global online maps are an important tool and data sets such for such maps are normally provided by commercial providers or public authorities. Nevertheless, the ever expanding trend of collaboratively collected geodata by hobbyists, namely Volunteered Geographic Information (VGI), increases regarding both data quantity and quality. Therefore, VGI can be considered as a real alternative data sou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge Based Systems
سال: 2021
ISSN: ['1872-7409', '0950-7051']
DOI: https://doi.org/10.1016/j.knosys.2020.106736