نتایج جستجو برای: الگوریتم fuzzy sarsa

تعداد نتایج: 112094  

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه قم - دانشکده علوم پایه 1387

چکیده ندارد.

2003
Nathan Sprague Dana H. Ballard

We present a new algorithm, GM-Sarsa(0), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...

2003
Nathan Sprague Dana Ballard

We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...

2014
Xiaoke Zhou Fei Zhu Quan Liu Yuchen Fu Wei Huang

Traffic problems often occur due to the traffic demands by the outnumbered vehicles on road. Maximizing traffic flow and minimizing the average waiting time are the goals of intelligent traffic control. Each junction wants to get larger traffic flow. During the course, junctions form a policy of coordination as well as constraints for adjacent junctions to maximize their own interests. A good t...

2016
Nihal ALTUNTAŞ Erkan İMAL Nahit EMANET Ceyda Nur ÖZTÜRK

In recent decades, reinforcement learning (RL) has been widely used in different research fields ranging from psychology to computer science. The unfeasibility of sampling all possibilities for continuous-state problems and the absence of an explicit teacher make RL algorithms preferable for supervised learning in the machine learning area, as the optimal control problem has become a popular su...

Journal: :CoRR 2018
Long Yang Minhao Shi Qian Zheng Wenjia Meng Gang Pan

Recently, a new multi-step temporal learning algorithm, called Q(σ), unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into e...

Journal: :CoRR 2018
Jaden B. Travnik Kory Wallace Mathewson Richard S. Sutton Patrick M. Pilarski

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation per...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید