نتایج جستجو برای: q algorithm
تعداد نتایج: 863118 فیلتر نتایج به سال:
This paper presents an algorithm to average a set of quaternion observations. The average quaternion is determined by minimizing the weighted sum of the squared Frobenius norms of the corresponding attitude matrix differences, subject to the unit-norm constraint in the determined solution. Two cases are presented: one that incorporates scalar weights and one that incorporates general weights on...
We present CLASSQ-L (for: class Q-learning) an application of the Q-learning reinforcement learning algorithm to play complete Wargus games. Wargus is a real-time strategy game where players control armies consisting of units of different classes (e.g., archers, knights). CLASSQ-L uses a single table for each class of unit so that each unit is controlled and updates its class’ Qtable. This enab...
Existing models of the evolution of social behavior typically involve innate strategies such as tit-for-tat. Yet, both behavioral and neural evidence indicates a substantial role for learned social behavior. We explore the evolutionary dynamics of two simple social behaviors among learning agents: Theft and punishment. In our simulation, agents employ Q-learning, a common reinforcement learning...
We show how an ensemble ofQ-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.
The efficient utilization of underutilized spectrum is the main theme of current research. The cognitive radio with the help of Q-learning algorithm is used to detect the presence of primary signals and utilize the spectrum in the absence of primary signals. The proposed Q-learning algorithm model identifies previously known signals and learns to detect the signals which otherwise could not be ...
We have proposed the utility-based Q-learning concept that supposes an agent internally has an emotional mechanism that derives subjective utilities from objective rewards and the agent uses the utilities as rewards of Q-learning. We have also proposed such an emotional mechanism that facilitates cooperative actions in Prisoner’s Dilemma (PD) games. However, this mechanism has been designed and...
Abstract We consider a new algorithm for reinforcement learning called A-learning. A-learning learns the advantages from a single training set. We compare A-learning with function approximation to Q-learning with function approximation and find that because A-learning approximates only the advantages it is less likely to exhibit bias due to the function approximation as compared to Q-learning.W...
We present three deterministic parameterized algorithms for well-studied packing and matching problems, namely, Weighted q-Dimensional p-Matching ((q, p)-WDM) and Weighted qSet p-Packing ((q, p)-WSP). More specifically, we present an O(2.85043) time deterministic algorithm for (q, p)-WDM, an O(8.04143) time deterministic algorithm for the unweighted version of (3, p)-WDM, and an O((0.56201 · 2....
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید