نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
The ability to transfer knowledge from one domain to another is an important aspect of learning. Knowledge transfer increases learning efficiency by freeing the learner from duplicating past efforts. In this paper, we demonstrate how reinforcement learning agents can use relational representations to transfer knowledge across related domains.
Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates.
What is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent’s expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research o...
Exploration is still one of the crucial problems in reinforcement learning, especially for agents acting in safety-critical situations. We propose a new directed exploration method, based on a notion of state controlability. Intuitively, if an agent wants to stay safe, it should seek out states where the effects of its actions are easier to predict; we call such states more controllable. Our ma...
Value functions derived from Markov decision processes arise as a central component of algorithms well performance metrics in many statistics and engineering applications machine learning. Computation the solution to associated Bellman equations is challenging most practical cases interest. A popular class approximation techniques, known temporal difference (TD) learning algorithms, are an impo...
This paper demonstrates the use of pattern-weights in order to develop a strategy for an automated player of a non-cooperative version of the game of Diplomacy. Diplomacy is a multi-player, zerosum and simultaneous move game with imperfect information. Patternweights represent stored knowledge of various aspects of a game that are learned through experience. An automated computer player is deve...
APPLICATION OF TEMPORAL DIFFERENCE LEARNING TO THE GAME OF SNAKE Christopher Lockhart
Submittedinpartialfulfillmentof thecourserequirementsfor " NeuralNetworks " ECEN5733 May2000
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید