نتایج جستجو برای: q policy

تعداد نتایج: 381585  

Journal: :Technology Innovation Management Review 2015

2016
Taiki Takahashi

Acknowledgements: The research reported in this paper was supported by a grant from the Grant-in-Aid for Scientific Research (" 21st century center of excellence " grant) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Summary Impulsivity and inconsistency in intertemporal choice (discounting) have drawn attention in econophysics and neuroeconomics. Although it...

Journal: :Neurocomputing 2005
Christos Dimitrakakis Samy Bengio

Ensemble algorithms can improve the performance of a given learning algorithm through the combination of multiple base classifiers into an ensemble. In this paper we attempt to train and combine the base classifiers using an adaptive policy. This policy is learnt through a Q-learning inspired technique. Its effectiveness for an essentially supervised task is demonstrated by experimental results...

Journal: :European Journal of Operational Research 2022

This paper addresses the single-item single-stocking location non-stationary stochastic lot-sizing problem under a reorder point -- order quantity control strategy. The points and quantities are chosen at beginning of planning horizon. allowed to vary with time we consider either be series time-dependent constants or fixed value; this leads two variants policy: (st,Qt) (st,Q) policies, respecti...

2016
Rémi Munos Tom Stepleton Anna Harutyunyan Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of “off-policyness”; and (3) it is efficient as it makes the b...

Journal: :CoRR 2015
ShouGuang Wang Dan You MengChu Zhou Carla Seatzu

1  Abstract—Linear constraint transformation is an essential step to solve the forbidden state problem in Petri nets that contain uncontrollable transitions. This work studies the equivalent transformation from a legal-marking set to its admissible-marking set given such a net. First, the concepts of an escaping-marking set and a transforming marking set are defined. Based on them, two algorit...

2003
Gerald Tesauro

Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actio...

Journal: :CoRR 2018
Yelong Shen Jianshu Chen Po-Sen Huang Yuqing Guo Jianfeng Gao

Learning to walk over a graph towards a target node for a given input query and a source node is an important problem in applications such as knowledge graph reasoning. It can be formulated as a reinforcement learning (RL) problem that has a known state transition model, but with partial observability and sparse reward. To overcome these challenges, we develop a graph walking agent called Reinf...

2009
Alireza Esfahani Morteza Analoui

This paper proposed and evaluates a QOS routing policy in packets topology and irregular traffic of communications network called K-shortest Widest paths Q-Routing. The technique used for the evaluation signals of reinforcement is Q-learning. Compared to standard Q-Routing, the exploration of paths is limited to K best non loop paths in term of hops number (number of routers in a path) leading ...

Journal: :CoRR 2016
Joshua Achiam

A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or Q-function may fail to improve performance—or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید