نتایج جستجو برای: bellman
تعداد نتایج: 4956 فیلتر نتایج به سال:
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span s...
This paper presents two universal algorithms for generalized discrete matrix Bellman equations with symmetric Toeplitz matrix. The algorithms are semiring extensions of two well-known methods solving Toeplitz systems in the ordinary linear algebra.
Some inequalities in 2-inner product spaces generalizing Bessel's result that are similar to the Boas-Bellman inequality from inner product spaces, are given. Applications for determinantal integral inequalities are also provided.
We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle...
Multimedia applications are Quality of Service (QoS) sensitive, which makes QoS support indispensable in high speed Integrated Services Packet Networks (ISPN). An important aspect is QoS routing, namely, the provision of QoS routes at session set up time based on user request and information about available network resources. This paper develops optimal QoS routing algorithms within an Autonomo...
The trapdoor channel is a binary input/output/state channel with state changing deterministically as the modulo2 sum of the current input, output and state. At each state, it behaves as one of two Z channels, each with crossover probability 1/2. Permuter et al. formulated the problem of finding the capacity of the trapdoor channel with feedback as a stochastic control problem. By solving the co...
(We will use this subroutine later on in the lecture for another algorithm, which is why we are defining it as a separate procedure). Informally, we think of d[v] as our current estimate for the shortest path from s to v. The algorithm begins by initializing each d[v] ← ∞, except for our source vertex s, which we initialize so that d[s] = 0 (trivially, the shortest path from s to s is length 0)...
This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED). In RLED, the goal is to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a data set of fixed expert demonstrations. This is slightly different from the batch Reinforcement Learning (RL) framework where only fixed sample...
Recently, Sutton et al. (2015) introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a √ γ-contraction modulus (where γ is the discount factor). This allows us to provide error bounds on the approximation erro...
This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Qvalue with respect to the reward function. These metho...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید