نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
This paper describes a novel approach based on online unsupervised adaptation and clustering using temporal-difference (TD) learning. Temporal-difference learning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The adaptation progres...
Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of onl...
In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning-correlation-based differential Hebbian learning and reward-based temporal difference learning-are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning ...
We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance meas...
Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions are optimal w.r.t approximating the true value function V . Two novel algorithms are proposed to approximate the true value function V . This paper makes the following contributions: • A batch algorit...
We use a reinforcement learning approach to learn a real world control problem, the truck backer-upper problem. In this problem, a tractor trailer truck must be backed into a loading dock from an arbitrary location and orientation. Our approach uses the temporal difference algorithm using a neural network as the value function approximator. The novelty of this work is the simplicity of our impl...
Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state...
Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t′ > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assig...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید