Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
نویسندگان
چکیده
In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an offline manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.
منابع مشابه
Linear Bayesian Reinforcement Learning
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic ...
متن کاملCover tree Bayesian reinforcement learning
This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the...
متن کاملDual Control for Approximate Bayesian Reinforcement Learning
Control of non-episodic, finite-horizon dynamical systems with uncertain dynamics poses a tough and elementary case of the exploration-exploitation trade-off. Bayesian reinforcement learning, reasoning about the effect of actions and future observations, offers a principled solution, but is intractable. We review, then extend an old approximate approach from control theory—where the problem is ...
متن کاملLearning Heuristic Functions through Approximate Linear Programming
Planning problems are often formulated as heuristic search. The choice of the heuristic function plays a significant role in the performance of planning systems, but a good heuristic is not always available. We propose a new approach to learning heuristic functions from previously solved problem instances in a given domain. Our approach is based on approximate linear programming, commonly used ...
متن کاملAlgorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs
One of the central challenges in reinforcement learning is to balance the exploration/exploitation tradeoff while scaling up to large problems. Although model-based reinforcement learning has been less prominent than value-based methods in addressing these challenges, recent progress has generated renewed interest in pursuing modelbased approaches: Theoretical work on the exploration/exploitati...
متن کامل