الگوریتم fuzzy sarsa

بهبود الگوریتم جستجوی گرانشی (GSA) به کمک منطق فازی

ژورنال: روشu200cهای هوشمند در صنعت برق 2013

امید مخلصی, سید حمید ظهیری, سید محمد رضوی, ناصر مهرشاد,

کارایی روش‌های جستجو و بهینه‌سازی هوش جمعی، تمایل محققین را برای استفاده از آن‌ها در مسائل مختلف پیچیده مهندسی به صورت چشمگیری افزایش داده است. از جمله الگوریتم‌های مبتنی بر هوش جمعی، الگوریتم جستجوی گرانشی (GSA) است که با الهام از قوانین فیزیکی جاذبه گرانشی و حرکت نیوتنی، افراد جامعه را که در واقع جرم‌های تصادفی در فضا هستند را به جستجو در فضا وا می‌دارد. این مقاله به ارائه مدل جمعیتی جدیدی به...

متن کامل

تصحیح سیستم طبقه‌بندی امتیاز توده‌سنگ با استفاده ‌از الگوریتم‌های‌ خوشه‌بندی ‌‌k-means و ‌fuzzy c-means

ژورنال: روش های تحلیلی و عددی در مهندسی معدن 2015

زکریا جلالی, سیدمهدی موسوی نسب

با توجه به اهمیت و کاربرد سیستم طبقه‌بندی امتیاز توده‌سنگ در مهندسی ‌سنگ، هدف از این مقاله تصحیح کلاس‌های نهایی این سیستم طبقه‌بندی با استفاده از الگوریتم‌های ‌خوشه‌بندی ‌k-means و fuzzy c-means (FCM)‌ است. در سیستم طبقه‌بندی امتیاز توده‌سنگ داده‌ها توسط یک سری از اطلاعات اولیه بر مبنای نظریات و قضاوت‌های تجربی طبقه‌بندی می‌شوند ولی با کاربرد الگوریتم‌های خوشه‌بندی در این سیستم ‌طبقه‌بندی، کلاس...

متن کامل

A Real-Time Energy Consumption Minimization Framework for Electric Vehicles Routing Optimization Based on SARSA Reinforcement Learning

Journal: :Vehicles 2022

A real-time, metadata-driven electric vehicle routing optimization to reduce on-road energy requirements is proposed in this work. The strategy employs the state–action–reward–state–action (SARSA) algorithm learn EV’s maximum travel policy as an agent. As a function of received reward signal, model evaluates optimal behavior Markov chain models (MCMs) are used estimate agent’s on road, which si...

متن کامل

Scaling Reinforcement Learning toward RoboCup Soccer

2001

Peter Stone Richard S. Sutton

RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the e ects of actions. We describe our application of episodic SMDP Sarsa( ) with linear tile-coding function approximation and variable to learning higher-level decisions in a keepaway subtask of RoboCup...

متن کامل

Smoothed Action Value Functions for Learning Gaussian Policies

2018

Ofir Nachum Mohammad Norouzi George Tucker Dale Schuurmans

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...

متن کامل

Intl . Joint Conf . on Neural Networks IJCNN ’ 98 , Anchorage

1998

Eduardo Sanchez

|Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilistic nature of the game makes it an interesting testbed problem for learning algorithms, though the problem of learning a good playing strategy is not obvious. Learning with a teacher ...

متن کامل

Posture Control of a Free Falling Robotic Cat for Soft Landing Using Reinforcement Learning

2008

J. J. Lee B. G. Shin

Robot’s posture control ability in the air is required when designing advanced robots that can run, jump and land, which can perform tasks in workplaces where ordinary robots cannot go. Using such a robot could afford human safety as well as cost reduction. In this paper, we describe the control method of robot’s posture in its falling for the safe landing using reinforcement learning (RL). The...

متن کامل

The Analysis of Experimental Results of Machine Learning Approach

2012

Jaroslav E. Poliscuk

In this article is analyzed a reinforcement learning method, in which is defined a subject of learning. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basi...

متن کامل

Reinforcement Learning under Model Mismatch

2017

Aurko Roy Huan Xu Sebastian Pokutta

We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs of [2, 17, 13] to themodel-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define ro...

متن کامل

Filtered Reinforcement Learning

2004

Douglas Aberdeen

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or ap...

متن کامل