نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

Journal: :CoRR 2017
Vladimir Marochko Leonard Johard Manuel Mazzara

Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Qlearning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the reh...

1998
Jeff G. Schneider Justin A. Boyan Andrew W. Moore

Production scheduling, the problem of sequentially con guring a factory to meet forecasted demands, is a critical problem throughout the manufacturing industry. The requirement of maintaining product inventories in the face of unpredictable demand and stochastic factory output makes standard scheduling models, such as job-shop, inadequate. Currently applied algorithms, such as simulated anneali...

Journal: :Journal of Machine Learning Research 2003
Michail G. Lagoudakis Ronald Parr

We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares temporal-difference learning algorithm (LSTD) for prediction problems, which is known for its efficient use of sample experiences compared to pure temporal-difference a...

Journal: :Japanese Sociological Review 1962

2007
András Antos Rémi Munos Csaba Szepesvári

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorou...

2010
Martha White Adam White

We prove the consistency and coverage error for the bootstrapped studentized interval around our the sample mean of the sequence of parameters for global function approximation. Each parameter vector θt on time step t corresponds to the action-value functionQt on that time step, withQ(s, a) = f(θ, s, a) for some bounded function f . A common example of f is a linear function f(θ, s, a) = θφ(s, ...

Journal: :IEEJ Transactions on Fundamentals and Materials 1990

Journal: :international journal of mathematical modelling and computations 0
salah el ouadih university radouan daher .

using a generalized spherical mean operator, we obtain a generalization of titchmarsh&apos;s theorem for the dunkl transform for functions satisfying the (&apos;; p)-dunkl lipschitz condition in the space lp(rd;wl(x)dx), 1 < p 6 2, where wl is a weight function invariant under the action of an associated re ection group.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید