نتایج جستجو برای: action value function
تعداد نتایج: 2342819 فیلتر نتایج به سال:
Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Qlearning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the reh...
Production scheduling, the problem of sequentially con guring a factory to meet forecasted demands, is a critical problem throughout the manufacturing industry. The requirement of maintaining product inventories in the face of unpredictable demand and stochastic factory output makes standard scheduling models, such as job-shop, inadequate. Currently applied algorithms, such as simulated anneali...
We propose a new approach to reinforcement learning for control problems which combines value-function approximation with linear architectures and approximate policy iteration. This new approach is motivated by the least-squares temporal-difference learning algorithm (LSTD) for prediction problems, which is known for its efficient use of sample experiences compared to pure temporal-difference a...
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorou...
We prove the consistency and coverage error for the bootstrapped studentized interval around our the sample mean of the sequence of parameters for global function approximation. Each parameter vector θt on time step t corresponds to the action-value functionQt on that time step, withQ(s, a) = f(θ, s, a) for some bounded function f . A common example of f is a linear function f(θ, s, a) = θφ(s, ...
using a generalized spherical mean operator, we obtain a generalization of titchmarsh's theorem for the dunkl transform for functions satisfying the ('; p)-dunkl lipschitz condition in the space lp(rd;wl(x)dx), 1 < p 6 2, where wl is a weight function invariant under the action of an associated re ection group.
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید