bellman

Approximate dynamic programming via direct search in the space of value function approximations

Journal: :European Journal of Operational Research 2011

Edilson Fernandes de Arruda Marcelo D. Fragoso João Bosco Ribeiro do Val

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic programming (DP) problems. For a fixed control policy, the span semi-norm of the so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks to minimize the span s...

متن کامل

Universal algorithms for generalized discrete matrix Bellman equations with symmetric Toeplitz matrix

2008

Sergĕı Sergeev

This paper presents two universal algorithms for generalized discrete matrix Bellman equations with symmetric Toeplitz matrix. The algorithms are semiring extensions of two well-known methods solving Toeplitz systems in the ordinary linear algebra.

متن کامل

ar X iv : m at h / 03 08 27 0 v 1 [ m at h . FA ] 2 8 A ug 2 00 3 SOME BOAS - BELLMAN TYPE INEQUALITIES IN 2 - INNER PRODUCT SPACES

2003

S. S. DRAGOMIR Y. J. CHO S. S. KIM A. SOFO

Some inequalities in 2-inner product spaces generalizing Bessel's result that are similar to the Boas-Bellman inequality from inner product spaces, are given. Applications for determinantal integral inequalities are also provided.

متن کامل

Bellman equation and viscosity solutions for mean-field stochastic control problem

2017

Huyên Pham Xiaoli Wei Huyên PHAM Xiaoli WEI

We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle...

متن کامل

Internet QoS Routing Using the Bellman-Ford Algorithm

1998

Dirceu Cavendish Mario Gerla

Multimedia applications are Quality of Service (QoS) sensitive, which makes QoS support indispensable in high speed Integrated Services Packet Networks (ISPN). An important aspect is QoS routing, namely, the provision of QoS routes at session set up time based on user request and information about available network resources. This paper develops optimal QoS routing algorithms within an Autonomo...

متن کامل

On the capacity of the general trapdoor channel with feedback

2016

Jui Wu Achilleas Anastasopoulos

The trapdoor channel is a binary input/output/state channel with state changing deterministically as the modulo2 sum of the current input, output and state. At each state, it behaves as one of two Z channels, each with crossover probability 1/2. Permuter et al. formulated the problem of finding the capacity of the trapdoor channel with feedback as a stochastic control problem. By solving the co...

متن کامل

Shortest Path : Dijkstra ’ s and Bellman - Ford

2017

Debmalya Panigrahi Tianqi Song Tianyu Wang

(We will use this subroutine later on in the lecture for another algorithm, which is why we are defining it as a separate procedure). Informally, we think of d[v] as our current estimate for the shortest path from s to v. The algorithm begins by initializing each d[v] ← ∞, except for our source vertex s, which we initialize so that d[s] = 0 (trivially, the shortest path from s to s is length 0)...

متن کامل

Boosted Bellman Residual Minimization Handling Expert Demonstrations

2014

Bilal Piot Matthieu Geist Olivier Pietquin

This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED). In RLED, the goal is to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a data set of fixed expert demonstrations. This is slightly different from the batch Reinforcement Learning (RL) framework where only fixed sample...

متن کامل

Emphatic TD Bellman Operator is a Contraction

Journal: :CoRR 2015

Assaf Hallak Aviv Tamar Shie Mannor

Recently, Sutton et al. (2015) introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a √ γ-contraction modulus (where γ is the discount factor). This allows us to provide error bounds on the approximation erro...

متن کامل

Bellman Gradient Iteration for Inverse Reinforcement Learning

Journal: :CoRR 2017

Kun Li Yanan Sui Joel W. Burdick

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Qvalue with respect to the reward function. These metho...

متن کامل