q algorithm

Q-Managed: A new algorithm for a multiobjective reinforcement learning

Journal: :Software impacts 2021

Multi-objective reinforcement learning involves the use of techniques to address problems with multiple objectives. To resolve this, we a hybrid multi-objective optimization method that provides mathematical guarantee all policies belonging Pareto Front can be found. The hybridization gave rise Q-Managed, which is given by ε−constraint and Q-Learning algorithm, where first limits environment dy...

متن کامل

A Fast Algorithm for Determining the Linear Complexity of Periodic Sequences over GF(3)

2005

Jianqin Zhou Qiang Zheng

Jianqin Zhou (Dept. of Computer Science, Anhui University of Technology, Ma’anshan 243002, P. R. China) (E-mail: [email protected]) Abstract: A fast algorithm is presented for determining the linear complexity and the minimal polynomial of periodic sequences over GF(q) with period q n p m , where p is a prime, q is a prime and a primitive root modulo p. The algorithm presented here generalizes...

متن کامل

Online adaptive policies for ensemble classifiers

Journal: :Neurocomputing 2005

Christos Dimitrakakis Samy Bengio

Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper we attempt to train and combine the base classifiers using an adaptive policy. This policy is learnt through a Q-learning inspired technique. Its effectiveness for an essentially supervised task is demonstrated by experimental results...

متن کامل

Harmonic Identification for Active Power Filters Via Adaptive Tabu Search Method

2004

Thanatchai Kulworawanichpong Kongpol Areerak Kongpan Areerak Sarawut Sujitjorn

Harmonic identification by using Adaptive Tabu Search (ATS) Method embedded in the active power filter is proposed in this paper. The use of the ATS identifies harmonic components more accurately and precisely. Besides the accuracy and precision, it is able to select only some particular harmonic orders that cause severe consequences to the system for elimination. This principle thus leads to t...

متن کامل

A Reinforcement Learning Method for Maximizing Undiscounted Rewards

1993

Anton Schwartz

While most Reinforcement Learning work utilizes temporal discounting to evaluate performance, the reasons for this are unclear. Is it out of desire or necessity? We argue that it is not out of desire, and seek to dispel the notion that temporal discounting is necessary by proposing a framework for undiscounted optimization. We present a metric of undiscounted performance and an algorithm for fi...

متن کامل

QL2, a simple reinforcement learning scheme for two-player zero-sum Markov games

Journal: :Neurocomputing 2008

Benoît Frénay Marco Saerens

Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q...

متن کامل

Artificial prediction markets for online prediction of continuous variables

Journal: :CoRR 2016

Fatemeh Jahedpari

We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML...

متن کامل

Naive Augmenting Q-Learning to Process Feature-Based Representations of States

2014

Janis Zuters

Temporal difference algorithms perform well on discrete and small problems. This paper proposes a modification of the Q-learning algorithm towards natural ability to receive a feature list instead of an already identified state in the input. Complete observability is still assumed. The algorithm, Naive Augmenting Q-Learning, has been designed through building a hierarchical structure of input f...

متن کامل

Efficient reinforcement learning: model-based Acrobot control

1997

Gary Boone

|Several methods have been proposed in the reinforcement learning literature for learning optimal policies for sequential decision tasks. Q-learning is a model-free algorithm that has recently been applied to the Acrobot, a two-link arm with a single actuator at the elbow that learns to swing its free endpoint above a target height. However, applying Q-learning to a real Acrobot may be impracti...

متن کامل

18.783 Elliptic Curves Spring 2013 Lecture #12 03/19/2013

2013

Andrew V. Sutherland

Theorem 12.2. Let p and q be prime divisors of N , and let `p and `q be the largest prime divisors of p− 1 and q− 1, respectively. If `p ≤ B and `p < `q then Algorithm 12.1 succeeds with probability at least 1− 1 `q . Proof. If a ≡ 0 mod p then the algorithm succeeds in step 2, so we may assume a ⊥ p. When the algorithm reaches ` = `p in step 3 we have b = a m, where m = ∏ `≤`p ` e is a multipl...

متن کامل