q policy

Determining employee awareness using the Human Aspects of Information Security Questionnaire (HAIS-Q)

Journal: :Computers & Security 2014

Kathryn Parsons Agata McCormac Marcus A. Butavicius Malcolm Robert Pattinson Cate Jerram

It is increasingly acknowledged that many threats to an organisation’s computer systems can be attributed to the behaviour of computer users. To quantify these human-based information security vulnerabilities, we are developing the Human Aspects of Information Security Questionnaire (HAIS-Q). The aim of this paper was twofold. The first aim was to outline the conceptual development of the HAIS-...

متن کامل

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

Journal: :CoRR 2016

Ludovic Hofer Hugo Gimbert

This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the...

متن کامل

Power-of-two policies for single-warehouse multi-retailer inventory systems with order frequency discounts

Journal: :Computers & OR 2009

Brian Q. Rieksts José A. Ventura Yale T. Herer

– All-unit order frequency discount • 94%-effective policy for a variable base period – Incremental order frequency discount • 94%-effective policy for a fixed base period • 98%-effective policy for a variable base period Introduction • A supply chain is a network of facilities that procure raw materials, transform them into intermediate goods and then final products, and deliver the products t...

متن کامل

On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning

2007

HAITAO LIU BINGRONG HONG

As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain ...

متن کامل

Trajectory-Based Modified Policy Iteration

2012

M. Gopal

This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-...

متن کامل

Hierarchical Reinforcement Learning with the MAXQ

2000

Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition , has both a procedural semantics|as a subroutine h...

متن کامل

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Journal: :J. Artif. Intell. Res. 2000

Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics—as a subroutine hi...

متن کامل

Q-functionals for Value-Based Continuous Control

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy i...

متن کامل

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

2015

Julien Pérolat Bruno Scherrer Bilal Piot Olivier Pietquin

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...

متن کامل

شیوع کیفیت خواب نامطلوب در دانشجویان دانشگاه‌های ایران: مرور ساختاریافته و متاآنالیز

ژورنال: مجله علوم پزشکی رازی 2016

خورسندی, محبوبه, رنجبران, مهدی,

Background: Students sleep pattern, due to the stress of studying and teaching workload are different with other non-student peers. The aim of this study was to determine the prevalence of poor sleep quality in college students of Iran by a meta-analysis study, to be as a final measure for policy makers in this field. Methods: In this meta-analysis study, the databases of PubMed, Science Direct...

متن کامل