نتایج جستجو برای: q algorithm

تعداد نتایج: 863118  

Journal: :Software impacts 2021

Multi-objective reinforcement learning involves the use of techniques to address problems with multiple objectives. To resolve this, we a hybrid multi-objective optimization method that provides mathematical guarantee all policies belonging Pareto Front can be found. The hybridization gave rise Q-Managed, which is given by ε−constraint and Q-Learning algorithm, where first limits environment dy...

2005
Jianqin Zhou Qiang Zheng

Jianqin Zhou (Dept. of Computer Science, Anhui University of Technology, Ma’anshan 243002, P. R. China) (E-mail: [email protected]) Abstract: A fast algorithm is presented for determining the linear complexity and the minimal polynomial of periodic sequences over GF(q) with period q n p m , where p is a prime, q is a prime and a primitive root modulo p. The algorithm presented here generalizes...

Journal: :Neurocomputing 2005
Christos Dimitrakakis Samy Bengio

Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper we attempt to train and combine the base classifiers using an adaptive policy. This policy is learnt through a Q-learning inspired technique. Its effectiveness for an essentially supervised task is demonstrated by experimental results...

2004
Thanatchai Kulworawanichpong Kongpol Areerak Kongpan Areerak Sarawut Sujitjorn

Harmonic identification by using Adaptive Tabu Search (ATS) Method embedded in the active power filter is proposed in this paper. The use of the ATS identifies harmonic components more accurately and precisely. Besides the accuracy and precision, it is able to select only some particular harmonic orders that cause severe consequences to the system for elimination. This principle thus leads to t...

1993
Anton Schwartz

While most Reinforcement Learning work utilizes temporal discounting to evaluate performance, the reasons for this are unclear. Is it out of desire or necessity? We argue that it is not out of desire, and seek to dispel the notion that temporal discounting is necessary by proposing a framework for undiscounted optimization. We present a metric of undiscounted performance and an algorithm for fi...

Journal: :Neurocomputing 2008
Benoît Frénay Marco Saerens

Markov games are a framework which formalises n-agent reinforcement learning. For instance, Littman proposed the minimax-Q algorithm to model two-agent zero-sum problems. This paper proposes a new simple algorithm in this framework, QL2, and compares it to several standard algorithms (Q-learning, Minimax and minimax-Q). Experiments show that QL2 converges to optimal mixed policies, as minimax-Q...

Journal: :CoRR 2016
Fatemeh Jahedpari

We propose the Artificial Continuous Prediction Market (ACPM) as a means to predict a continuous real value, by integrating a range of data sources and aggregating the results of different machine learning (ML) algorithms. ACPM adapts the concept of the (physical) prediction market to address the prediction of real values instead of discrete events. Each ACPM participant has a data source, a ML...

2014
Janis Zuters

Temporal difference algorithms perform well on discrete and small problems. This paper proposes a modification of the Q-learning algorithm towards natural ability to receive a feature list instead of an already identified state in the input. Complete observability is still assumed. The algorithm, Naive Augmenting Q-Learning, has been designed through building a hierarchical structure of input f...

1997
Gary Boone

|Several methods have been proposed in the reinforcement learning literature for learning optimal policies for sequential decision tasks. Q-learning is a model-free algorithm that has recently been applied to the Acrobot, a two-link arm with a single actuator at the elbow that learns to swing its free endpoint above a target height. However, applying Q-learning to a real Acrobot may be impracti...

2013
Andrew V. Sutherland

Theorem 12.2. Let p and q be prime divisors of N , and let `p and `q be the largest prime divisors of p− 1 and q− 1, respectively. If `p ≤ B and `p < `q then Algorithm 12.1 succeeds with probability at least 1− 1 `q . Proof. If a ≡ 0 mod p then the algorithm succeeds in step 2, so we may assume a ⊥ p. When the algorithm reaches ` = `p in step 3 we have b = a m, where m = ∏ `≤`p ` e is a multipl...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید