نتایج جستجو برای: policy iterations
تعداد نتایج: 276392 فیلتر نتایج به سال:
We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BY10b]. The main difference from the s...
It is shown that for any sufficiently regular even Minkowski valuation Φ which homogeneous and intertwines rigid motions, convex body K in a smooth neighborhood of the unit ball, there exists sequence positive numbers (γm)m=1∞ such (γmΦmK)m=1∞ converges to ball with respect Hausdorff metric.
Optimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics
In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...
In this paper, we propose a power optimal opportunistic scheduling scheme for a multiuser single hop Time Division Multiple Access (TDMA) system. We formulate the problem of minimizing average transmission power subject to minimum rate constraints for individual users. We suggest a stochastic approximation based scheme to implement the policy and prove the convergence and stability of this algo...
In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the mod...
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...
We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random p...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید