نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

Journal: :Math. Oper. Res. 2013
Bruno Scherrer

Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe sta...

2004
S. Panwar

We generalize the classical max-min rate allocation policy with the support of the minimum rate requirement and peak rate constraint for each connection. Since a centralized algorithm for the generalized maxmin (GMM) rate allocation requires global information, which is difficult to maintain and manage in a large network, we develop a distributed protocol to achieve the GMM policy using the ava...

2015
Jessica Chemali Alessandro Lazaric

We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore,...

1998
Yiwei Thomas Hou Henry H.-Y. Tzeng Shivendra S. Panwar

We generalize the classical max-min rate allocation policy with the support of the minimum rate requirement and peak rate constraint for each connection. Since a centralized algorithm for the generalized maxmin (GMM) rate allocation requires global information, which is di cult to maintain and manage in a large network, we develop a distributed protocol to achieve the GMM policy using the avail...

2010
Dimitri P. Bertsekas

A counterexample due to Williams and Baird [WiB93] (Example 2 in their paper) is transcribed here in the context and notation of two papers by Bertsekas and Yu[BeY10a], [BeY10b], and it is also adapted to the case of Q-factor-based policy iteration. The example illustrates that cycling is possible in asynchronous policy iteration if the initial policy and cost/Q-factor iterations do not satisfy...

Journal: :CoRR 2017
Jae Young Lee Richard S. Sutton

Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...

Journal: :Journal of Functional Analysis 1983

Journal: :AUC PHILOSOPHICA ET HISTORICA 2017

Journal: :Colloquium Mathematicum 1990

Journal: :Oper. Res. Lett. 2013
Eugene A. Feinberg Jefferson Huang

This note considers an average-cost Markov Decision Process (MDP) with finite state and action sets and satisfying the additional condition that there is a state towhich the system jumps fromany state andunder any action with a positive probability. The main result is that the policy iteration algorithm is strongly polynomial for such MDPs, which are often used to model replacement and maintena...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید