q policy

Williams-Baird Counterexample for Q-Factor Asynchronous Policy Iteration

2010

Dimitri P. Bertsekas

A counterexample due to Williams and Baird [WiB93] (Example 2 in their paper) is transcribed here in the context and notation of two papers by Bertsekas and Yu[BeY10a], [BeY10b], and it is also adapted to the case of Q-factor-based policy iteration. The example illustrates that cycling is possible in asynchronous policy iteration if the initial policy and cost/Q-factor iterations do not satisfy...

متن کامل

On the $$\varvec{(Q,r)}$$(Q,r) policy for perishables with positive lead times and multiple outstanding orders

Journal: :Annals of Operations Research 2018

متن کامل

Minimum information divergence of Q-functions for dynamic treatment resumes

Journal: :Information geometry 2022

This paper aims at presenting a new application of information geometry to reinforcement learning focusing on dynamic treatment resumes. In standard framework learning, Q-function is defined as the conditional expectation reward given state and an action for single-stage situation. We introduce equivalence relation, called policy equivalence, in space all Q-functions. A class divergence every s...

متن کامل

Composable Deep Reinforcement Learning for Robotic Manipulation

2018

Tuomas Haarnoja Vitchyr Pong Aurick Zhou Murtaza Dalal Pieter Abbeel Sergey Levine

Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained usi...

متن کامل

Fuzzy State Aggregation and Off-policy Reinforcement Learning for Stochastic Environments

2006

Dean C. Wardell

Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the environment it is operating in changes. This ability to learn in an unsupervised manner in a changing environment is applicable in complex domains through the use of function approximation of the domain’s policy. The function...

متن کامل

Optimizing Spoken Dialogue Management from Data Corpora with Fitted Value Iteration

2010

Senthilkumar Chandramohan Matthieu Geist Olivier Pietquin

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

Optimizing spoken dialogue management with fitted value iteration

2010

Senthilkumar Chandramohan Matthieu Geist Olivier Pietquin

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Journal: :CoRR 2017

Jae Young Lee Richard S. Sutton

Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...

متن کامل

Object focused q-learning for autonomous agents

2013

Luis C. Cobo Charles Lee Isbell Andrea Lockerd Thomaz

We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ig...

متن کامل

Q-learning and policy iteration algorithms for stochastic shortest path problems

Journal: :Annals of Operations Research 2012

متن کامل