نتایج جستجو برای: الگوریتم fuzzy sarsa
تعداد نتایج: 112094 فیلتر نتایج به سال:
کارایی روشهای جستجو و بهینهسازی هوش جمعی، تمایل محققین را برای استفاده از آنها در مسائل مختلف پیچیده مهندسی به صورت چشمگیری افزایش داده است. از جمله الگوریتمهای مبتنی بر هوش جمعی، الگوریتم جستجوی گرانشی (GSA) است که با الهام از قوانین فیزیکی جاذبه گرانشی و حرکت نیوتنی، افراد جامعه را که در واقع جرمهای تصادفی در فضا هستند را به جستجو در فضا وا میدارد. این مقاله به ارائه مدل جمعیتی جدیدی به...
با توجه به اهمیت و کاربرد سیستم طبقهبندی امتیاز تودهسنگ در مهندسی سنگ، هدف از این مقاله تصحیح کلاسهای نهایی این سیستم طبقهبندی با استفاده از الگوریتمهای خوشهبندی k-means و fuzzy c-means (FCM) است. در سیستم طبقهبندی امتیاز تودهسنگ دادهها توسط یک سری از اطلاعات اولیه بر مبنای نظریات و قضاوتهای تجربی طبقهبندی میشوند ولی با کاربرد الگوریتمهای خوشهبندی در این سیستم طبقهبندی، کلاس...
A real-time, metadata-driven electric vehicle routing optimization to reduce on-road energy requirements is proposed in this work. The strategy employs the state–action–reward–state–action (SARSA) algorithm learn EV’s maximum travel policy as an agent. As a function of received reward signal, model evaluates optimal behavior Markov chain models (MCMs) are used estimate agent’s on road, which si...
RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple agents, and long and variable delays in the e ects of actions. We describe our application of episodic SMDP Sarsa( ) with linear tile-coding function approximation and variable to learning higher-level decisions in a keepaway subtask of RoboCup...
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...
|Blackjack or twenty-one is a card game where the player attempts to beat the dealer, by obtaining a sum of card values that is equal to or less than 21 so that his total is higher than the dealer's. The probabilistic nature of the game makes it an interesting testbed problem for learning algorithms, though the problem of learning a good playing strategy is not obvious. Learning with a teacher ...
Robot’s posture control ability in the air is required when designing advanced robots that can run, jump and land, which can perform tasks in workplaces where ordinary robots cannot go. Using such a robot could afford human safety as well as cost reduction. In this paper, we describe the control method of robot’s posture in its falling for the safe landing using reinforcement learning (RL). The...
In this article is analyzed a reinforcement learning method, in which is defined a subject of learning. The essence of this method is the selection of activities by a try and fail process and awarding deferred rewards. If an environment is characterized by the Markov property, then step-by-step dynamics will enable forecasting of subsequent conditions and awarding subsequent rewards on the basi...
We study reinforcement learning under model misspecification, where we do not have access to the true environment but only to a reasonably close approximation to it. We address this problem by extending the framework of robust MDPs of [2, 17, 13] to themodel-free Reinforcement Learning setting, where we do not have access to the model parameters, but can only sample states from it. We define ro...
Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or ap...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید