نتایج جستجو برای: q learning
تعداد نتایج: 717428 فیلتر نتایج به سال:
در این پایان نامه مسئله استراتژی قیمت¬دهی تولیدکنندگان انرژی در بازار یکنواخت برق برای رسیدن به حداکثر سود بررسی شده است. فروشندگان انرژی پیشنهادات قیمت خود را با بهره گیری از سه الگوریتم متفاوت q-learning، r-learning و sarsa برای یک ساعت مشخص از شبانه روز به اپراتور سیستم اعلام می کنند. الگوریتم های یادگیری تقویتی ذکر شده با روش انتخاب عمل تبرید تدریجی، ترکیب شده اند. سود بدست آمده برای فروشند...
Article history: Received 28 March 2009 Received in revised form 16 May 2012 Accepted 3 June 2012 Available online 9 June 2012
This report details the implementation of three Reinforcment learning methods, Monte Carlo, SARSA, and Q-Learning, and compares their performances in the Windy and CliffWalking Gridworlds.
This paper studies the problem of pruning an ensemble of classifiers from a Reinforcement Learning perspective. It contributes a new pruning approach that uses the Q-learning algorithm in order to approximate an optimal policy of choosing whether to include or exclude each classifier from the ensemble. Extensive experimental comparisons of the proposed approach against state-of-the-art pruning ...
The life-long learning architecture attempts to create an adaptive agent through the incorporation of prior knowledge over the lifetime of a learning agent. Our paper focuses on task transfer in reinforcement learning and specifically in Q-learning. There are three main model free methods for performing task transfer in Qlearning: direct transfer, soft transfer and memoryguided exploration. In ...
In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvári and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates...
We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or wh...
In this paper, a Q-Learning method for trasfer scheduling of freight cars in a train is proposed. In the proposed method, the number of freight-movements in order to line freights in the desired order is reflected by evaluation value for each pair of freight-layout and removal-destination at a freight yard. The best transfer scheduling can be derived by selecting the removal-action of freight t...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید