Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model
نویسندگان
چکیده
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches simulator real environment. It is therefore imperative utilize learn a robust policy for real-world deployment. work, we consider learning Robust Markov Decision Processes (RMDP), where agent tries seek with respect unexpected perturbations on environments. Specifically, focus setting environment be characterized as generative model constrained perturbation added during testing. Our goal identify near-optimal perturbed testing environment, which introduces additional technical difficulties need simultaneously estimate uncertainty samples find worst-case To solve propose generic method formalizes an opponent obtain two-player zero-sum game, further show that Nash Equilibrium corresponds policy. We prove that, polynomial number of model, our algorithm high probability. able deal general under some mild assumptions also extended more complex problems partial observable decision process, thanks game-theoretical formulation.
منابع مشابه
Robust partially observable Markov decision process
We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to ...
متن کاملOptimistic Planning in Markov Decision Processes Using a Generative Model
We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample complexity problem of computing, with probability 1−δ, an �-optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for StochasticOpt...
متن کاملModel-based Bayesian Reinforcement Learning in Factored Markov Decision Process
Learning the enormous number of parameters is a challenging problem in model-based Bayesian reinforcement learning. In order to solve the problem, we propose a model-based factored Bayesian reinforcement learning (F-BRL) approach. F-BRL exploits a factored representation to describe states to reduce the number of parameters. Representing the conditional independence relationships between state ...
متن کاملA Robust Constrained Markov Decision Process Model for Admission Control in a Single Server Queue
This paper presents a robust optimization approach for discounted constrained Markov decision processes with payoff uncertainty. It is assumed that the decision-maker has no distributional information on the unknown payoffs. Two types of uncertainty sets, convex hulls and intervals are considered. Interval uncertainty sets are parametrized allowing a subset of the payoffs to vary within interva...
متن کاملReinforcement Learning in Robust Markov Decision Processes
An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system. We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic. We devise an algorithm that is adaptive to potentially a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i7.20705