Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
نویسندگان
چکیده
In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, tted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of tted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the di culties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for e cient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem.
منابع مشابه
Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics
In many robotic applications, some aspects of the system dynamics can be modeled accurately while others are difficult to obtain or model. We present a novel reinforcement learning (RL) method for continuous state and action spaces that learns with partial knowledge of the system and without active exploration. It solves linearly-solvable Markov decision processes (L-MDPs), which are well suite...
متن کاملHow to Rein in the Volatile Actor: A New Bounded Perspective
Actor-critic algorithms are amongst the most well-studied reinforcement learning algorithms that can be used to solve Markov decision processes (MDPs) via simulation. Unfortunately, the parameters of the so-called “actor” in the classical actor-critic algorithm exhibit great volatility — getting unbounded in practice, whence they have to be artificially constrained to obtain solutions in practi...
متن کاملLunar Lander: A Continous-Action Case Study for Policy-Gradient Actor-Critic Algorithms
We empirically investigate modifications and implementation techniques required to apply a policy-gradient actor-critic algorithm to reinforcement learning problems with continuous state and action spaces. As a test-bed, we introduce a new simulated task, which involves landing a lunar module in a simplified two-dimensional world. The empirical results demonstrate the importance of efficiently ...
متن کاملMean Actor Critic
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...
متن کاملReinforcement Learning in Continuous State and Action Spaces
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to ...
متن کامل