نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

1994
Zhiyuan Li Trung N. Nguyen

{ In the decision regarding static scheduling vs. dynamic scheduling, the only argument against the former is the potential imbalance of the workload. However, it has never been clear how the workload distributes in the iterations of Fortran parallel loops. This work examines a set of Perfect benchmarking programs 2] and report two striking results. First, when using operation counts as the mea...

Journal: :CoRR 2015
Ali Heydari

Adaptive optimal control of nonlinear dynamic systems with deterministic and known dynamics under a known undiscounted infinite-horizon cost function is investigated. Policy iteration scheme initiated using a stabilizing initial control is analyzed in solving the problem. The convergence of the iterations and the optimality of the limit functions, which follows from the established uniqueness o...

2016
Mohammadsadegh Mohagheghi

Markov Decision Processes (MDPs) are used to model both non-deterministic and probabilistic systems. Probabilistic model checking is an approach for verifying quantitative properties of probabilistic systems that are modeled by MDPs. Value and Policy Iteration and modified version of them are well-known approaches for computing a wide range of probabilistic properties. This paper tries to impro...

2017
Feng Wu Shlomo Zilberstein Xiaoping Chen

We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge ...

Journal: :Operations Research 1984
Awi Federgruen Paul H. Zipkin

This paper presents an algorithm to compute an optimal (s, S) policy under standard assumptions (stationary data, well-behaved one-period costs, discrete demand, full backlogging, and the average-cost criterion). The method is iterative, starting with an arbitrary, given (s, S) policy and converging to an optimal policy in a finite number of iterations. Any of the available approximations can t...

Journal: :CoRR 2016
Yichen Chen Mengdi Wang

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity p...

Journal: :CoRR 2016
Richard Liaw Sanjay Krishnan Animesh Garg Daniel Crankshaw Joseph Gonzalez Kenneth Y. Goldberg

Rather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a "meta-policy" from previously learned policies. This paper reports results from experiments using Deep Reinforcement Learning on a continuous-state, discrete-action autonomous driving simulator. We explore how Deep Neural Networks can represent meta-policies that switch amo...

2005
Niket S. Kaisare Jong Min Lee Jay H. Lee

In this paper, we empirically investigate the convergence properties of policy iteration applied to the optimal control of systems with continuous state and action spaces. We demonstrate that policy iteration requires lesser iterations than value iteration to converge, but requires more function evaluations to generate cost-to-go approximations in the policy evaluation step. Two different alter...

Journal: :Journal of Mathematical Analysis and Applications 1992

Journal: :Časopis pro pěstování matematiky a fysiky 1928

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید