نتایج جستجو برای: q algorithm
تعداد نتایج: 863118 فیلتر نتایج به سال:
This paper aims to use the previous work related to the DELPHI method, and, in particular, the Q-Sort method for information retrieval of a panel of experts, to provide a new and simple algorithm to generate Q-Sort matrices that adjust to the size of a given survey to have more questions whose weight is null for the outcome of the round, giving experts the need to prioritise some questions abov...
This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free re...
This paper shows how to create near-optimal instances of the Certified Write-All algorithm called AWT that was introduced by Anderson and Woll [2]. This algorithm is the best known deterministic algorithm that can be used to simulate n synchronous parallel processors on n asynchronous processors. In this algorithm n processors update n memory cells and then signal the completion of the updates....
Elliptic curve cryptosystems (ECC) are new generations of public key cryptosystems that have a smaller key size for the same level of security. The exponentiation on elliptic curve is the most important operation in ECC, so when the ECC is put into practice, the major problem is how to enhance the speed of the exponentiation. It is thus of great interest to develop algorithms for exponentiation...
: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...
We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...
A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information—the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requ...
We introduce a generalized Lilbert [Lucas-Hilbert] matrix. Explicit formulæ are derived for the LU-decomposition and their inverses, as well as the Cholesky decomposition. The approach is to use q-analysis and to leave the justification of the necessary identities to the q-version of Zeilberger’s celebrated algorithm.
In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvári and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates...
We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or wh...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید