نتایج جستجو برای: q learning
تعداد نتایج: 717428 فیلتر نتایج به سال:
Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...
In this paper, we propose a hierarchical reinforcement learning architecture for a robot with large degrees of freedom. In order to enable learning in a practical numbers of trials, we introduce a low-dimensional representation of the state of the robot for higher-level planning. The upper level learns a discrete sequence of sub-goals in a low-dimensional state space for achieving the main goal...
In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values. These overestimations result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected action value. We introduce an alternative way ...
Abstract. There are only a few learning algorithms applicable to stochastic dynamic games. Learning in games is generally difficult because of the non-stationary environment in which each decision maker aims to learn its optimal decisions with minimal information in the presence of the other decision makers who are also learning. In the case of dynamic games, learning is more challenging becaus...
This is the team description of Osaka University \Trackies" for RoboCup-99. We have worked two issues for our new team. First, we have changed our robot system from a remote controlled vehicle to a self-contained robot. The other, we have proposed a new learning method based on a Q-learning method so that a real robot can aquire a behavior by reinforcement learning.
In this paper, we describe a cooperative transportation to a target position with two humanoid robots and introduce a machine learning approach to solving the problem. The difficulty of the task lies on the fact that each position shifts with the other’s while they are moving. Therefore, it is necessary to correct the position in a real-time manner. However, it is difficult to generate such an ...
We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein controller perceives distorted and noisy future reward, modeled by nonlinearity that accentuates gains under-represents losses relative to reference point. analyze asymptotic behavior scheme analyzing its limiting differential equation using theory monotone dyn...
Reinforcement learning systems improve behaviour based on scalar rewards from a critic. In this work vision based behaviours, servoing and Wandering, are learned through a Q-learning method which handles continuous states and actions. There is no requirement for camera calibration, an actuator model, or a knowledgeable teacher. Learning thmugh observing the actions of other behaviours improves ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید