q policy

Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks

Journal: :Wireless Networks 2014

Prashanth L. A. Abhranil Chatterjee Shalabh Bhatnagar

In this paper, we consider an intrusion detection application for Wireless Sensor Networks (WSNs). We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous state-action spac...

متن کامل

Solving POMDP by On-Policy Linear Approximate Learning Algorithm

1999

Qiming He Mark A. Shayman Mark Shayman

This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) problem. The proposed algorithm is devised to provide a policy-making framework for Network Management Systems (NMS) which is in essence an engineering application without an exact model. The algorithm consists of two phases. Firstly, the model is estimated and policy...

متن کامل

Learning with Options that Terminate Off-Policy

Journal: :CoRR 2017

Anna Harutyunyan Peter Vrancx Pierre-Luc Bacon Doina Precup Ann Nowé

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optim...

متن کامل

A simple and robust batch-ordering inventory policy under incomplete demand knowledge

Journal: :Computers & Industrial Engineering 2012

Liwei Bai Christos Alexopoulos Mark E. Ferguson Kwok-Leung Tsui

Generally, the derivation of an inventory policy requires the knowledge of the underlying demand distribution. Unfortunately, in many settings such as retail, demand is not completely observable in a direct way or inventory records may be inaccurate. A variety of factors, including the potential inaccuracy of inventory records, motivate retailers to seek replenishment policies with a fixed orde...

متن کامل

Fuzzy State Aggregation and Policy Hill Climbing for Stochastic Environments

Journal: :International Journal of Computational Intelligence and Applications 2006

Dean C. Wardell Gilbert L. Peterson

Received (received date) Revised (revised date) Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual ag...

متن کامل

What goes around comes around: How large are spillbacks from US monetary policy?

Journal: :Journal of Monetary Economics 2022

• We quantify spillbacks from US monetary policy. use structural scenario analysis and minimum relative entropy methods. Spillbacks reflect a non-trivial share of the domestic effect They materialise through Tobin’s q/cash flow stock market wealth effects. Spillovers policy entail to economy. Applying counterfactual analyses in Bayesian proxy vector-autoregressive model we find that account for...

متن کامل

Measuring policymaking capacities of schools: validation of the Policy Making Capacities Questionnaire (PMC-Q)

Journal: :School Effectiveness and School Improvement 2023

In order to support research on school effectiveness, there is a need for valid and reliable instruments assess policymaking capacities of schools. Increasingly, seen as shared responsibility the entire pedagogical team school. this article, data were analysed from sample 1,696 (care) teachers coordinators principals 77 Flemish primary schools critical aspects concerning validity reliability Po...

متن کامل

Hemisystems of Q(6,q), q odd

Journal: :Journal of Combinatorial Theory, Series A 2016

متن کامل

Whittle index based Q-learning for restless bandits with average reward

Journal: :Automatica 2022

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage structure index policy to reduce search space Q-learning, resulting in major computational gains. Rigorous convergence analysis provided, supported by numerical experiments. The experiments show excellent empir...

متن کامل

A novel method for QoS provisioning with protection in GMPLS networks

Journal: :Computer Communications 2006

Tricha Anjali Caterina M. Scoglio

In this paper, a new optimal policy is introduced to determine, adapt, and protect the Generalized MultiProtocol Label Switching (GMPLS) network topology based on the current traffic load. The Integrated Traffic Engineering (ITE) paradigm provides mechanisms for dynamic addition of physical capacity to optical networks. In the absence of such mechanisms, the rejection of incoming requests may b...

متن کامل