نتایج جستجو برای: bellman

تعداد نتایج: 4956  

Journal: :European Journal of Operational Research 2011
Edilson Fernandes de Arruda Marcelo D. Fragoso João Bosco Ribeiro do Val

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span s...

2008
Sergĕı Sergeev

This paper presents two universal algorithms for generalized discrete matrix Bellman equations with symmetric Toeplitz matrix. The algorithms are semiring extensions of two well-known methods solving Toeplitz systems in the ordinary linear algebra.

2003
S. S. DRAGOMIR Y. J. CHO S. S. KIM A. SOFO

Some inequalities in 2-inner product spaces generalizing Bessel's result that are similar to the Boas-Bellman inequality from inner product spaces, are given. Applications for determinantal integral inequalities are also provided.

2017
Huyên Pham Xiaoli Wei Huyên PHAM Xiaoli WEI

We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle...

1998
Dirceu Cavendish Mario Gerla

Multimedia applications are Quality of Service (QoS) sensitive, which makes QoS support indispensable in high speed Integrated Services Packet Networks (ISPN). An important aspect is QoS routing, namely, the provision of QoS routes at session set up time based on user request and information about available network resources. This paper develops optimal QoS routing algorithms within an Autonomo...

2016
Jui Wu Achilleas Anastasopoulos

The trapdoor channel is a binary input/output/state channel with state changing deterministically as the modulo2 sum of the current input, output and state. At each state, it behaves as one of two Z channels, each with crossover probability 1/2. Permuter et al. formulated the problem of finding the capacity of the trapdoor channel with feedback as a stochastic control problem. By solving the co...

2017
Debmalya Panigrahi Tianqi Song Tianyu Wang

(We will use this subroutine later on in the lecture for another algorithm, which is why we are defining it as a separate procedure). Informally, we think of d[v] as our current estimate for the shortest path from s to v. The algorithm begins by initializing each d[v] ← ∞, except for our source vertex s, which we initialize so that d[s] = 0 (trivially, the shortest path from s to s is length 0)...

2014
Bilal Piot Matthieu Geist Olivier Pietquin

This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED). In RLED, the goal is to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a data set of fixed expert demonstrations. This is slightly different from the batch Reinforcement Learning (RL) framework where only fixed sample...

Journal: :CoRR 2015
Assaf Hallak Aviv Tamar Shie Mannor

Recently, Sutton et al. (2015) introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a √ γ-contraction modulus (where γ is the discount factor). This allows us to provide error bounds on the approximation erro...

Journal: :CoRR 2017
Kun Li Yanan Sui Joel W. Burdick

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Qvalue with respect to the reward function. These metho...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید