نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

Journal: :Missouri Journal of Mathematical Sciences 1994

2011
Huizhen Yu Dimitri P. Bertsekas

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...

Journal: :Journal of Functional Analysis 2023

It is shown that for any sufficiently regular even Minkowski valuation Φ which homogeneous and intertwines rigid motions, convex body K in a smooth neighborhood of the unit ball, there exists sequence positive numbers (γm)m=1∞ such (γmΦmK)m=1∞ converges to ball with respect Hausdorff metric.

In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...

Journal: :Journal of Nonlinear Sciences and Applications 2016

2006
Abhijeet Bhorkar Abhay Karandikar Vivek S. Borkar

In this paper, we propose a power optimal opportunistic scheduling scheme for a multiuser single hop Time Division Multiple Access (TDMA) system. We formulate the problem of minimizing average transmission power subject to minimum rate constraints for individual users. We suggest a stochastic approximation based scheme to implement the policy and prove the convergence and stability of this algo...

Journal: :Kybernetika 2010
Karel Sladký

In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the mod...

1999
Yishay Mansour Satinder P. Singh

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol­ icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...

2010
Mohammad Ghavamzadeh Alessandro Lazaric Odalric-Ambrym Maillard Rémi Munos

We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random p...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید