نتایج جستجو برای: الگوریتم fuzzy sarsa

تعداد نتایج: 112094  

کارایی روش‌های جستجو و بهینه‌سازی هوش جمعی، تمایل محققین را برای استفاده از آن‌ها در مسائل مختلف پیچیده مهندسی به صورت چشمگیری افزایش داده است. از جمله الگوریتم‌های مبتنی بر هوش جمعی، الگوریتم جستجوی گرانشی (GSA) است که با الهام از قوانین فیزیکی جاذبه گرانشی و حرکت نیوتنی، افراد جامعه را که در واقع جرم‌های تصادفی در فضا هستند را به جستجو در فضا وا می‌دارد. این مقاله به ارائه مدل جمعیتی جدیدی به...

زکریا جلالی, سیدمهدی موسوی نسب

با توجه به اهمیت و کاربرد سیستم طبقه‌بندی امتیاز توده‌سنگ در مهندسی ‌سنگ، هدف از این مقاله تصحیح کلاس‌های نهایی این سیستم طبقه‌بندی با استفاده از الگوریتم‌های ‌خوشه‌بندی ‌k-means و fuzzy c-means (FCM)‌ است. در سیستم طبقه‌بندی امتیاز توده‌سنگ داده‌ها توسط یک سری از اطلاعات اولیه بر مبنای نظریات و قضاوت‌های تجربی طبقه‌بندی می‌شوند ولی با کاربرد الگوریتم‌های خوشه‌بندی در این سیستم ‌طبقه‌بندی، کلاس...

Journal: :Vehicles 2022

A real-time, metadata-driven electric vehicle routing optimization to reduce on-road energy requirements is proposed in this work. The strategy employs the state–action–reward–state–action (SARSA) algorithm learn EV’s maximum travel policy as an agent. As a function of received reward signal, model evaluates optimal behavior Markov chain models (MCMs) are used estimate agent’s on road, which si...

2001
Peter Stone Richard S. Sutton

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the e ects of actions. We describe our application of episodic SMDP Sarsa( ) with linear tile-coding function approximation and variable to learning higher-level decisions in a keepaway subtask of RoboCup...

2018
Ofir Nachum Mohammad Norouzi George Tucker Dale Schuurmans

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...

1998
Eduardo Sanchez

|Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilistic nature of the game makes it an interesting testbed problem for learning algorithms, though the problem of learning a good playing strategy is not obvious. Learning with a teacher ...

2008
J. J. Lee B. G. Shin

Robot’s posture control ability in the air is required when designing advanced robots that can run, jump and land, which can perform tasks in workplaces where ordinary robots cannot go. Using such a robot could afford human safety as well as cost reduction. In this paper, we describe the control method of robot’s posture in its falling for the safe landing using reinforcement learning (RL). The...

2012
Jaroslav E. Poliscuk

In this article is analyzed a reinforcement learning method, in which is defined a subject of learning. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basi...

2017
Aurko Roy Huan Xu Sebastian Pokutta

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs of [2, 17, 13] to themodel-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define ro...

2004
Douglas Aberdeen

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or ap...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید