Empirical Results on Convergence and Exploration in Approximate Policy Iteration
نویسندگان
چکیده
In this paper, we empirically investigate the convergence properties of policy iteration applied to the optimal control of systems with continuous state and action spaces. We demonstrate that policy iteration requires lesser iterations than value iteration to converge, but requires more function evaluations to generate cost-to-go approximations in the policy evaluation step. Two different alternatives to policy evaluation, based on iteration over simulated states and simulation of improved policies are presented. We then demonstrate that the λ-policy iteration method, with λ ∈ [0, 1], is a tradeoff between value and policy iteration. Finally, the issue of exploration to expand the coverage of the state space during offline iteration is also considered. Copyright c ©2005 IFAC
منابع مشابه
Some New Existence, Uniqueness and Convergence Results for Fractional Volterra-Fredholm Integro-Differential Equations
This paper demonstrates a study on some significant latest innovations in the approximated techniques to find the approximate solutions of Caputo fractional Volterra-Fredholm integro-differential equations. To this aim, the study uses the modified Adomian decomposition method (MADM) and the modified variational iteration method (MVIM). A wider applicability of these techniques are based on thei...
متن کاملPareto-optimal Solutions for Multi-objective Optimal Control Problems using Hybrid IWO/PSO Algorithm
Heuristic optimization provides a robust and efficient approach for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier. The convergence rate and suitable diversity of solutions are of great importance for multi-objective evolutionary algorithms. The focu...
متن کاملApproximate Policy Iteration: A Survey and Some New Methods
We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced polic...
متن کاملSolving Multi-objective Optimal Control Problems of chemical processes using Hybrid Evolutionary Algorithm
Evolutionary algorithms have been recognized to be suitable for extracting approximate solutions of multi-objective problems because of their capability to evolve a set of non-dominated solutions distributed along the Pareto frontier. This paper applies an evolutionary optimization scheme, inspired by Multi-objective Invasive Weed Optimization (MOIWO) and Non-dominated Sorting (NS) strategi...
متن کاملError Bounds for Approximate Policy Iteration
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problemsfrom a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by suc...
متن کامل