q algorithm

نتایج جستجو برای: q algorithm

تعداد نتایج: 863118 فیلتر نتایج به سال:

Aas 07-213 Averaging Quaternions

2007

Yang Cheng F. Landis Markley John L. Crassidis Yaakov Oshman

This paper presents an algorithm to average a set of quaternion observations. The average quaternion is determined by minimizing the weighted sum of the squared Frobenius norms of the corresponding attitude matrix differences, subject to the unit-norm constraint in the determined solution. Two cases are presented: one that incorporates scalar weights and one that incorporates general weights on...

متن کامل

CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games

2012

Ulit Jaidee Héctor Muñoz-Avila

We present CLASSQ-L (for: class Q-learning) an application of the Q-learning reinforcement learning algorithm to play complete Wargus games. Wargus is a real-time strategy game where players control armies consisting of units of different classes (e.g., archers, knights). CLASSQ-L uses a single table for each class of unit so that each unit is controlled and updates its class’ Qtable. This enab...

متن کامل

Flexible theft and resolute punishment: Evolutionary dynamics of social behavior among reinforcement-learning agents

2014

James MacGlashan Michael L. Littman Fiery Cushman

Existing models of the evolution of social behavior typically involve innate strategies such as tit-for-tat. Yet, both behavioral and neural evidence indicates a substantial role for learned social behavior. We explore the evolutionary dynamics of two simple social behaviors among learning agents: Theft and punishment. In our simulation, agents employ Q-learning, a common reinforcement learning...

متن کامل

UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

Journal: :CoRR 2017

Richard Y. Chen Szymon Sidor Pieter Abbeel John Schulman

We show how an ensemble ofQ-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

متن کامل

Afrl-ry-wp-tp-2010-1148 Detecting Primary Signals for Efficient Utilization of Spectrum Using Q-learning (postprint)

2010

Y. B. Reddy

The efficient utilization of underutilized spectrum is the main theme of current research. The cognitive radio with the help of Q-learning algorithm is used to detect the presence of primary signals and utilize the spectrum in the absence of primary signals. The proposed Q-learning algorithm model identifies previously known signals and learns to detect the signals which otherwise could not be ...

متن کامل

Evolving subjective utilities: Prisoner's Dilemma game examples

2011

Koichi Moriyama Satoshi Kurihara Masayuki Numao

We have proposed the utility-based Q-learning concept that supposes an agent internally has an emotional mechanism that derives subjective utilities from objective rewards and the agent uses the utilities as rewards of Q-learning. We have also proposed such an emotional mechanism that facilitates cooperative actions in Prisoner’s Dilemma (PD) games. However, this mechanism has been designed and...

متن کامل

A-Learning for Approximate Planning

2004

D. Blatt S. A. Murphy

Abstract We consider a new algorithm for reinforcement learning called A-learning. A-learning learns the advantages from a single training set. We compare A-learning with function approximation to Q-learning with function approximation and find that because A-learning approximates only the advantages it is less likely to exhibit bias due to the function approximation as compared to Q-learning.W...

متن کامل

Deterministic Parameterized Algorithms for Matching and Packing Problems

Journal: :CoRR 2013

Meirav Zehavi

We present three deterministic parameterized algorithms for well-studied packing and matching problems, namely, Weighted q-Dimensional p-Matching ((q, p)-WDM) and Weighted qSet p-Packing ((q, p)-WSP). More specifically, we present an O(2.85043) time deterministic algorithm for (q, p)-WDM, an O(8.04143) time deterministic algorithm for the unweighted version of (3, p)-WDM, and an O((0.56201 · 2....

متن کامل

An error analysis of an algorithm for Marcum's Q-function

Journal: :Computers & Mathematics with Applications 1976

متن کامل

A refinement of H. C. Williams’ $q$th root algorithm

Journal: :Mathematics of Computation 1993

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید