نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

T Tsumoto

Brain-derived neurotrophic factor (BDNF) is known to play a role in experience-dependent plasticity of the developing visual cortex. For example, BDNF acutely enhances long-term potentiation and blocks long-term depression in the visual cortex of young rats. Such acute actions of BDNF suggested to be mediated mainly through presynaptic mechanisms. A chronic application of BDNF to the visual cor...

T Tsumoto

Brain-derived neurotrophic factor (BDNF) is known to play a role in experience-dependent plasticity of the developing visual cortex. For example, BDNF acutely enhances long-term potentiation and blocks long-term depression in the visual cortex of young rats. Such acute actions of BDNF suggested to be mediated mainly through presynaptic mechanisms. A chronic application of BDNF to the visual cor...

Journal: :JACIII 2015
Takaaki Kobayashi Takeshi Shibuya Masahiko Morita

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space – and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action ...

2007
Yuan-Pao Hsu Kao-Shing Hwang Hsin-Yi Lin

This article presents an algorithm that combines a FAST-based algorithm (Flexible Adaptable-Size Topology), called ARM, and Q-learning algorithm. The ARM is a self organizing architecture. Dynamically adjusting the size of sensitivity regions of each neuron and adaptively pruning one of the redundant neurons, the ARM can preserve resources (available neurons) to accommodate more categories. The...

2000
Junichiro Yoshimoto Shin Ishii Masa-aki Sato

In this article, we propose a new reinforcement learning (RL) method for a system having continuous state and action spaces. Our RL method has an architecture like the actorcritic model. The critic tries to approximate the Q-function, which is the expected future return for the current state-action pair. The actor tries to approximate a stochastic soft-max policy defined by the Q-function. The ...

1983
Richard S. Sutton Satinder Singh David McAllester

We present a series of formal and empirical results comparing the efficiency of various policy-gradient methods—methods for reinforcement learning that directly update a parameterized policy according to an approximation of the gradient of performance with respect to the policy parameter. Such methods have recently become of interest as an alternative to value-function-based methods because of ...

1997
Danny Birmingham

The analytic structure of the Regge action on a cone in d dimensions over a boundary of arbitrary topology is determined in simplicial minisuperspace. The minisuperspace is defined by the assignment of a single internal edge length to all 1-simplices emanating from the cone vertex, and a single boundary edge length to all 1-simplices lying on the boundary. The Regge action is analyzed in the sp...

2008
Vali Derhami Vahid Johari Majd Majid Nili Ahmadabadi

This paper provides a new Fuzzy Reinforcement Learning (FRL) algorithm based on critic-only architecture. The proposed algorithm, called Fuzzy Sarsa Learning (FSL), tunes the parameters of conclusion parts of the Fuzzy Inference System (FIS) online. Our FSL is based on Sarsa, which approximates the Action Value Function (AVF) and is an on-policy method. In each rule, actions are selected accord...

2012
Steve Dini Mark Serrano

Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of performing a given action in a given state and following a fixed policy thereafter. The basic implementation uses a q-table to store the data. With increasing complexity in the environment and the agent, this approach fails to scale well as the space requirements b...

2007
Sertan Girgin Faruk Polat Reda Alhajj

This paper employs state similarity to improve reinforcement learning performance. This is achieved by first identifying states with similar sub-policies. Then, a tree is constructed to be used for locating common action sequences of states as derived from possible optimal policies. Such sequences are utilized for defining a similarity function between states, which is essential for reflecting ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید