policy iterations

نتایج جستجو برای: policy iterations

تعداد نتایج: 276392 فیلتر نتایج به سال:

Iterations on Convex Quadrilaterals

Journal: :Missouri Journal of Mathematical Sciences 1994

متن کامل

LIDS REPORT 2871 1 Q - Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems ∗

2011

Huizhen Yu Dimitri P. Bertsekas

We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...

متن کامل

Iterations of Minkowski valuations

Journal: :Journal of Functional Analysis 2023

It is shown that for any sufficiently regular even Minkowski valuation Φ which homogeneous and intertwines rigid motions, convex body K in a smooth neighborhood of the unit ball, there exists sequence positive numbers (γm)m=1∞ such (γmΦmK)m=1∞ converges to ball with respect Hausdorff metric.

متن کامل

Optimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics

Journal: Journal of Artificial Intelligence and Data Mining 2015

F. Tatari, M. B. Naghibi-Sistani,

In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...

متن کامل

ITERATIONS OF CURVATURE IMAGES

Journal: :Mathematika 2020

متن کامل

Biomorphs via modified iterations

Journal: :Journal of Nonlinear Sciences and Applications 2016

متن کامل

Power Optimal Opportunistic Scheduling in Fading Wireless Channel

2006

Abhijeet Bhorkar Abhay Karandikar Vivek S. Borkar

In this paper, we propose a power optimal opportunistic scheduling scheme for a multiuser single hop Time Division Multiple Access (TDMA) system. We formulate the problem of minimizing average transmission power subject to minimum rate constraints for individual users. We suggest a stochastic approximation based scheme to implement the policy and prove the convergence and stability of this algo...

متن کامل

Identification of optimal policies in Markov decision processes

Journal: :Kybernetika 2010

Karel Sladký

In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the mod...

متن کامل

On the Complexity of Policy Iteration

1999

Yishay Mansour Satinder P. Singh

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...

متن کامل

LSTD with Random Projections

2010

Mohammad Ghavamzadeh Alessandro Lazaric Odalric-Ambrym Maillard Rémi Munos

We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random p...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید