نتایج جستجو برای: policy iterations
تعداد نتایج: 276392 فیلتر نتایج به سال:
Convergence is a central problem in both computer science and in population biology. Will a program terminate? Will a population go to an equilibrium? In general these questions are quite difficult – even unsolvable. In this paper we will concentrate on very simple iterations of the form
In this talk we will consider three properties of iterations with mixed (finite/countable) supports: iterations of arbitrary length preserve ω1, iterations of length ≤ ω2 over a model of CH have the א2-chain condition and iterations of length < ω2 over a model of CH do not increase the size of the continuum. Definition 1. Let Pκ be an iterated forcing construction of length κ, with iterands 〈Q̇α...
A complete description of the iterated monodromy groups of postcritically finite backward polynomial iterations is given in terms of their actions on rooted trees and automata generating them. We describe an iterative algorithm for finding kneading automata associated with post-critically finite topological polynomials and discuss some open questions about iterated monodromy groups of polynomials.
In this paper we are interested in the convergence analysis of the Stochastic Dual Dynamic Algorithm (SDDP) algorithm in a general framework, and regardless of whether the underlying probability space is discrete or not. We consider a convex stochastic control program not necessarily linear and the resulting dynamic programming equation. We prove under mild assumptions that the approximations o...
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید