نتایج جستجو برای: bellman

تعداد نتایج: 4956  

2009
Marek Petrik Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, this approach provably finds an approximate value function that minimizes the Bellman residual. Sol...

2009
Constantin Udrişte Ionel Ţevy

This paper interrelates the performance criteria involving path independent curvilinear integrals, the multitime maximum principle, the multitime Hamilton-Jacobi-Bellman PDEs and the multitime dynamic programming, to study the linear-quadratic regulator problems and to characterize the optimal control by means of multitime variant of the Riccati PDE that may be viewed as a feedback law. Section...

2017
Nan Jiang Akshay Krishnamurthy Alekh Agarwal John Langford Robert E. Schapire

This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank , that we show enables tractable learning of near-optimal behavior in these processes and is natura...

2009
Da Silva Marco Fredo Durand Jovan Popovic Marco da Silva Frédo Durand Jovan Popović

Controllers are necessary for physically-based synthesis of character animation. However, creating controllers requires either manual tuning or expensive computer optimization. We introduce linear Bellman combination as a method for reusing existing controllers. Given a set of controllers for related tasks, this combination creates a controller that performs a new task. It naturally weights the...

Journal: :CoRR 2010
Marek Petrik Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms ...

Journal: :J. Global Optimization 2007
Bao-Zhu Guo Bing Sun

Using a semi-discrete model that describes the heat transfer of a continuous casting process of steel, this paper is addressed to an optimal control problem of the continuous casting process in the secondary cooling zone with water spray control. The approach is based on the Hamilton–Jacobi–Bellman equation satisfied by the value function. It is shown that the value function is the viscosity so...

2007
Daniel Schneegaß Steffen Udluft Thomas Martinetz

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisati...

2010
Amir Massoud Farahmand Rémi Munos Csaba Szepesvári

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Niko...

2017
Dave Mount

All-Pairs Shortest Paths: Earlier, we saw that Dijkstra’s algorithm and the Bellman-Ford algorithm both solved the problem of computing shortest paths in graphs from a single source vertex. Suppose that we want instead to compute shortest paths between all pairs of vertices. We could do this applying either Dijkstra or Bellman-Ford using every vertex as a source, but today we will consider an a...

2015
P. AZIMZADEH

This work is motivated by numerical solutions to Hamilton-Jacobi-Bellman quasivariational inequalities (HJBQVIs) associated with combined stochastic and impulse control problems. In particular, we consider (i) direct control, (ii) penalized, and (iii) explicit control schemes applied to the HJBQVI problem. Scheme (i) takes the form of a Bellman problem involving an operator which is not necessa...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید