policy iterations

نتایج جستجو برای: policy iterations

تعداد نتایج: 276392 فیلتر نتایج به سال:

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Journal: :Math. Oper. Res. 2013

Bruno Scherrer

Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe sta...

متن کامل

A Generalized Max-Min Rate Allocation Policy and Its Distributed Implementation Using the ABR Flow Control Mechanism

2004

S. Panwar

We generalize the classical max-min rate allocation policy with the support of the minimum rate requirement and peak rate constraint for each connection. Since a centralized algorithm for the generalized maxmin (GMM) rate allocation requires global information, which is difficult to maintain and manage in a large network, we develop a distributed protocol to achieve the GMM policy using the ava...

متن کامل

Direct Policy Iteration with Demonstrations

2015

Jessica Chemali Alessandro Lazaric

We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore,...

متن کامل

A Generalized Max-Min Rate Allocation Policy and Its Distributed Implementation Using ABR Flow Control Mechanism

1998

Yiwei Thomas Hou Henry H.-Y. Tzeng Shivendra S. Panwar

We generalize the classical max-min rate allocation policy with the support of the minimum rate requirement and peak rate constraint for each connection. Since a centralized algorithm for the generalized maxmin (GMM) rate allocation requires global information, which is di cult to maintain and manage in a large network, we develop a distributed protocol to achieve the GMM policy using the avail...

متن کامل

Williams-Baird Counterexample for Q-Factor Asynchronous Policy Iteration

2010

Dimitri P. Bertsekas

A counterexample due to Williams and Baird [WiB93] (Example 2 in their paper) is transcribed here in the context and notation of two papers by Bertsekas and Yu[BeY10a], [BeY10b], and it is also adapted to the case of Q-factor-based policy iteration. The example illustrates that cycling is possible in asynchronous policy iteration if the initial policy and cost/Q-factor iterations do not satisfy...

متن کامل

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Journal: :CoRR 2017

Jae Young Lee Richard S. Sutton

Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...

متن کامل

The asymptotics of nonexpansive iterations

Journal: :Journal of Functional Analysis 1983

متن کامل

Preserving measurability with Cohen iterations

Journal: :AUC PHILOSOPHICA ET HISTORICA 2017

متن کامل

Cardinals and iterations of HOD

Journal: :Colloquium Mathematicum 1990

متن کامل

Strong polynomiality of policy iterations for average-cost MDPs modeling replacement and maintenance problems

Journal: :Oper. Res. Lett. 2013

Eugene A. Feinberg Jefferson Huang

This note considers an average-cost Markov Decision Process (MDP) with finite state and action sets and satisfying the additional condition that there is a state towhich the system jumps fromany state andunder any action with a positive probability. The main result is that the policy iteration algorithm is strongly polynomial for such MDPs, which are often used to model replacement and maintena...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید