نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
A two-level rejuvenation policy for software systems with degradation process is studied. Both full restarts and partial restarts are considered in this rejuvenation strategy. A semi-Markov process model is constructed, and based on its closed-form solution we obtain the system availability as a bivariate function. Then, the rejuvenation policy is analyzed to maximize the system availability. S...
Computational grids provide computing power by sharing resources across administrative domains. This sharing, coupled with the need to execute untrusted code from arbitrary users, introduces security hazards. This paper addresses the security implications of making Q computing resource available to untrusted a&cations via computational grids. It highlights the problems and limitations of curren...
We study the use of single-agent and multi-agent Q-learning to learn seller pricing strategies in three diierent two-seller models of agent economies, using a simple regression tree approximation scheme to represent the Q-functions. Our results are highly encouraging { regression trees match the training times and policy performance of lookup table Q-learning, while ooering signiicant advantage...
In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changi...
Developing a closed-form cost expression for an (R,s,nQ) policy where the demand process is compound generalized Erlang Logistics/SCM Research Group 1 Developing a closed-form cost expression for an (R,s,nQ) policy where the demand process is compound generalized Erlang Abstract We derive a closed-form cost expression for an (R,s,nQ) inventory control policy where all replenishment orders have ...
Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system ...
Whenever demand for a single item can be categorized into classes of di erent priority an inventory rationing policy should be considered In this paper we analyse a continuous review s Q model with lost sales and two demand classes A so called critical level policy is applied to ration the inventory among the two demand classes With this policy low priority demand is rejected in anticipation of...
We consider the approximate solution of stochastic optimal control problems using a neurodynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popu...
This article presents an algorithm that combines a FAST-based algorithm (Flexible Adaptable-Size Topology), called ARM, and Q-learning algorithm. The ARM is a self organizing architecture. Dynamically adjusting the size of sensitivity regions of each neuron and adaptively pruning one of the redundant neurons, the ARM can preserve resources (available neurons) to accommodate more categories. The...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید