q policy

Analysis of a two-level software rejuvenation policy

Journal: :Rel. Eng. & Sys. Safety 2005

Wei Xie Yiguang Hong Kishor S. Trivedi

A two-level rejuvenation policy for software systems with degradation process is studied. Both full restarts and partial restarts are considered in this rejuvenation strategy. A semi-Markov process model is constructed, and based on its closed-form solution we obtain the system availability as a bivariate function. Then, the rejuvenation policy is analyzed to maximize the system availability. S...

متن کامل

Fine-Grain Access Control for Securing Shared Resources in Computational Grids

2002

Ali Raza Butt Sumalatha Adabala Nirav H. Kapadia Renato J. O. Figueiredo José A. B. Fortes

Computational grids provide computing power by sharing resources across administrative domains. This sharing, coupled with the need to execute untrusted code from arbitrary users, introduces security hazards. This paper addresses the security implications of making Q computing resource available to untrusted a&cations via computational grids. It highlights the problems and limitations of curren...

متن کامل

Multi-agent Q-learning and Regression Trees for Automated Pricing Decisions

2000

Manu Sridharan Gerald Tesauro

We study the use of single-agent and multi-agent Q-learning to learn seller pricing strategies in three diierent two-seller models of agent economies, using a simple regression tree approximation scheme to represent the Q-functions. Our results are highly encouraging { regression trees match the training times and policy performance of lookup table Q-learning, while ooering signiicant advantage...

متن کامل

Q -Learning with Linear Function Approximation

2007

Francisco S. Melo M. Isabel Ribeiro

In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changi...

متن کامل

Developing a closed-form cost expression for an (R, s, nQ) policy where the demand process is compound generalized Erlang

Journal: :Oper. Res. Lett. 2007

Christian Larsen Gudrun P. Kiesmüller

Developing a closed-form cost expression for an (R,s,nQ) policy where the demand process is compound generalized Erlang Logistics/SCM Research Group 1 Developing a closed-form cost expression for an (R,s,nQ) policy where the demand process is compound generalized Erlang Abstract We derive a closed-form cost expression for an (R,s,nQ) inventory control policy where all replenishment orders have ...

متن کامل

Training a real-world POMDP-based Dialogue System

2007

Blaise Thomson Jost Schatzmann Karl Weilhammer Hui Ye Steve Young

Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system ...

متن کامل

Inventory Rationing in an (ss Q) I N Ventory Model with Lost Sales and Two Demand Classes

2007

Philip Melchiors Rommert Dekker Marcel Kleijn

Whenever demand for a single item can be categorized into classes of di erent priority an inventory rationing policy should be considered In this paper we analyse a continuous review s Q model with lost sales and two demand classes A so called critical level policy is applied to ration the inventory among the two demand classes With this policy low priority demand is rejected in anticipation of...

متن کامل

Differential Training of 1 Rollout Policies

1997

Dimitri P. Bertsekas

We consider the approximate solution of stochastic optimal control problems using a neurodynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popu...

متن کامل

An ARM-Based Q-Learning Algorithm

2007

Yuan-Pao Hsu Kao-Shing Hwang Hsin-Yi Lin

This article presents an algorithm that combines a FAST-based algorithm (Flexible Adaptable-Size Topology), called ARM, and Q-learning algorithm. The ARM is a self organizing architecture. Dynamically adjusting the size of sensitivity regions of each neuron and adaptively pruning one of the redundant neurons, the ARM can preserve resources (available neurons) to accommodate more categories. The...

متن کامل

Triply heavy tetraquark states with the $QQ\bar{Q}\bar{q}$ Q Q Q ¯ q ¯ configuration

Journal: :The European Physical Journal A 2017

متن کامل