bellman

Robust Value Function Approximation Using Bilinear Programming

2009

Marek Petrik Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Sol...

متن کامل

Multitime linear-quadratic regulator problem based on curvilinear integral

2009

Constantin Udrişte Ionel Ţevy

This paper interrelates the performance criteria involving path independent curvilinear integrals, the multitime maximum principle, the multitime Hamilton-Jacobi-Bellman PDEs and the multitime dynamic programming, to study the linear-quadratic regulator problems and to characterize the optimal control by means of multitime variant of the Riccati PDE that may be viewed as a feedback law. Section...

متن کامل

Contextual Decision Processes with low Bellman rank are PAC-Learnable

2017

Nan Jiang Akshay Krishnamurthy Alekh Agarwal John Langford Robert E. Schapire

This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank , that we show enables tractable learning of near-optimal behavior in these processes and is natura...

متن کامل

Linear Bellman combination for control of character animation Citation

2009

Da Silva Marco Fredo Durand Jovan Popovic Marco da Silva Frédo Durand Jovan Popović

Controllers are necessary for physically-based synthesis of character animation. However, creating controllers requires either manual tuning or expensive computer optimization. We introduce linear Bellman combination as a method for reusing existing controllers. Given a set of controllers for related tasks, this combination creates a controller that performs a new task. It naturally weights the...

متن کامل

Global Optimization for Value Function Approximation

Journal: :CoRR 2010

Marek Petrik Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms ...

متن کامل

Numerical solution to the optimal feedback control of continuous casting process

Journal: :J. Global Optimization 2007

Bao-Zhu Guo Bing Sun

Using a semi-discrete model that describes the heat transfer of a continuous casting process of steel, this paper is addressed to an optimal control problem of the continuous casting process in the secondary cooling zone with water spray control. The approach is based on the Hamilton–Jacobi–Bellman equation satisfied by the value function. It is shown that the value function is the viscosity so...

متن کامل

Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification

2007

Daniel Schneegaß Steffen Udluft Thomas Martinetz

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisati...

متن کامل

Error Propagation for Approximate Policy and Value Iteration

2010

Amir Massoud Farahmand Rémi Munos Csaba Szepesvári

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Niko...

متن کامل

CMSC 451 : Lecture 13 All - Pairs Shortest Paths and the Floyd - Warshall Algorithm

2017

Dave Mount

All-Pairs Shortest Paths: Earlier, we saw that Dijkstra’s algorithm and the Bellman-Ford algorithm both solved the problem of computing shortest paths in graphs from a single source vertex. Suppose that we want instead to compute shortest paths between all pairs of vertices. We could do this applying either Dijkstra or Bellman-Ford using every vertex as a source, but today we will consider an a...

متن کامل

Weakly Chained Matrices and Impulse Control

2015

P. AZIMZADEH

This work is motivated by numerical solutions to Hamilton-Jacobi-Bellman quasivariational inequalities (HJBQVIs) associated with combined stochastic and impulse control problems. In particular, we consider (i) direct control, (ii) penalized, and (iii) explicit control schemes applied to the HJBQVI problem. Scheme (i) takes the form of a Bellman problem involving an operator which is not necessa...

متن کامل