منابع مشابه
Reinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the well-known E and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. We also present a...
متن کاملSoftware Development for Simulation of Reformer Furnace
In recent years, lots of research has been done on effective usage of natural gas; the first step in these processes is conversion of natural gas to Syngas. Natural gas reforming process by refomer furnace is commonly used for syngas and hydrogen production. In this paper, a windows based software, RIPI-RefSim, is introduced. By using proper heat, mass, kinetic and thermodynamic models as w...
متن کاملReinforcement Learning in Finite MDPs: PAC Analysis
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the wellknown E3 and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. A more refined an...
متن کاملQualitative Analysis of VASS-Induced MDPs
We consider infinite-state Markov decision processes (MDPs) that are induced by extensions of vector addition systems with states (VASS). Verification conditions for these MDPs are described by reachability and Büchi objectives w.r.t. given sets of control-states. We study the decidability of some qualitative versions of these objectives, i.e., the decidability of whether such objectives can be...
متن کاملAnalysis of methods for solving MDPs
New proofs for two extensions to value iteration are derived when the type of initialisation of the value function is considered. Theoretical requirements that guarantee the convergence of backward value iteration and weaker requirements for the convergence of backups based on best actions only are identified. Experimental results show that standard value iteration performs significantly faster...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Korea Academia-Industrial cooperation Society
سال: 2014
ISSN: 1975-4701
DOI: 10.5762/kais.2014.15.9.5480