Reinforcement Learning for MDPs with Constraints
نویسنده
چکیده
In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.
منابع مشابه
Convergent Reinforcement Learning for Hierarchical Reactive Plans
Hierarchical reinforcement learning techniques operate on structured plans. Although structured representations add expressive power to Markov Decision Processes (MDPs), current approaches impose constraints that force the associated convergence proofs to depend upon a subroutinestyle execution model that restricts adaptive response. We develop an alternate approach to convergent learning that ...
متن کاملUncertain Reward-Transition MDPs for Negotiable Reinforcement Learning
Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning
متن کاملMultiple-Goal Reinforcement Learning with Modular Sarsa(O)
We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...
متن کاملReinforcement Learning in Large or Unknown MDPs
Reinforcement Learning in Large or Unknown MDPs
متن کاملA Generalized Reinforcement-Learning Model: Convergence and Applicationa
Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (mdp) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in mdp...
متن کامل