Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
نویسندگان
چکیده
To overcome the curses of dimensionality and modeling dynamic programming methods to solve Markov decision process problems, reinforcement learning (RL) are adopted in practice. Contrary traditional RL algorithms, which do not consider structural properties optimal policy, we propose a structure-aware algorithm exploit ordered multithreshold structure if any. We prove asymptotic convergence proposed policy. Due reduction policy space, provides remarkable improvements storage computational complexities over classical algorithms. Simulation results establish that converges faster than other
منابع مشابه
Reinforcement Learning and Markov Decision Processes
Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. Fir...
متن کاملReinforcement Learning in Robust Markov Decision Processes
An important challenge in Markov decision processes is to ensure robustness with respect to unexpected or adversarial system behavior while taking advantage of well-behaving parts of the system. We consider a problem setting where some unknown parts of the state space can have arbitrary transitions while other parts are purely stochastic. We devise an algorithm that is adaptive to potentially a...
متن کاملReinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over cor...
متن کاملLearning Policies for Markov Decision Processes from Data
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2022
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2021.3108121