منابع مشابه
Policy Gradient Methods
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by gradient descent. It belongs to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class while traditional value function approximation approaches derive policies from a value function. Policy gradient approaches have various a...
متن کاملPolicy Gradient Methods for Off-policy Control
Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ [13], converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would ...
متن کاملPolicy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to thes...
متن کاملPolicy Gradient Methods for Robot Control
Reinforcement learning offers the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be co...
متن کاملAdaptive Step-Size for Policy Gradient Methods
In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement–learning field. In particular, they have been largely employed in motor control and robotic applications, thanks to their ability to cope with continuous state and action domains and partial observable problems. Policy gradient researches have been mainly focused on the identification of effe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scholarpedia
سال: 2010
ISSN: 1941-6016
DOI: 10.4249/scholarpedia.3698