نتایج جستجو برای: q policy

تعداد نتایج: 381585  

Journal: :Computers & Security 2014
Kathryn Parsons Agata McCormac Marcus A. Butavicius Malcolm Robert Pattinson Cate Jerram

It is increasingly acknowledged that many threats to an organisation’s computer systems can be attributed to the behaviour of computer users. To quantify these human-based information security vulnerabilities, we are developing the Human Aspects of Information Security Questionnaire (HAIS-Q). The aim of this paper was twofold. The first aim was to outline the conceptual development of the HAIS-...

Journal: :CoRR 2016
Ludovic Hofer Hugo Gimbert

This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the...

Journal: :Computers & OR 2009
Brian Q. Rieksts José A. Ventura Yale T. Herer

– All-unit order frequency discount • 94%-effective policy for a variable base period – Incremental order frequency discount • 94%-effective policy for a fixed base period • 98%-effective policy for a variable base period Introduction • A supply chain is a network of facilities that procure raw materials, transform them into intermediate goods and then final products, and deliver the products t...

2007
HAITAO LIU BINGRONG HONG

As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain ...

2012
M. Gopal

This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-...

2000
Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition , has both a procedural semantics|as a subroutine h...

Journal: :J. Artif. Intell. Res. 2000
Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics—as a subroutine hi...

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value state-action pair, our network transforms state into function that can be rapidly evaluated in parallel many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical off-policy control, where policy i...

2015
Julien Pérolat Bruno Scherrer Bilal Piot Olivier Pietquin

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...

Background: Students sleep pattern, due to the stress of studying and teaching workload are different with other non-student peers. The aim of this study was to determine the prevalence of poor sleep quality in college students of Iran by a meta-analysis study, to be as a final measure for policy makers in this field. Methods: In this meta-analysis study, the databases of PubMed, Science Direct...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید