نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
In this paper, we consider an intrusion detection application for Wireless Sensor Networks (WSNs). We study the problem of scheduling the sleep times of the individual sensors, where the objective is to maximize the network lifetime while keeping the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous state-action spac...
This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) problem. The proposed algorithm is devised to provide a policy-making framework for Network Management Systems (NMS) which is in essence an engineering application without an exact model. The algorithm consists of two phases. Firstly, the model is estimated and policy...
A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optim...
Generally, the derivation of an inventory policy requires the knowledge of the underlying demand distribution. Unfortunately, in many settings such as retail, demand is not completely observable in a direct way or inventory records may be inaccurate. A variety of factors, including the potential inaccuracy of inventory records, motivate retailers to seek replenishment policies with a fixed orde...
Received (received date) Revised (revised date) Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual ag...
• We quantify spillbacks from US monetary policy. use structural scenario analysis and minimum relative entropy methods. Spillbacks reflect a non-trivial share of the domestic effect They materialise through Tobin’s q/cash flow stock market wealth effects. Spillovers policy entail to economy. Applying counterfactual analyses in Bayesian proxy vector-autoregressive model we find that account for...
In order to support research on school effectiveness, there is a need for valid and reliable instruments assess policymaking capacities of schools. Increasingly, seen as shared responsibility the entire pedagogical team school. this article, data were analysed from sample 1,696 (care) teachers coordinators principals 77 Flemish primary schools critical aspects concerning validity reliability Po...
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage structure index policy to reduce search space Q-learning, resulting in major computational gains. Rigorous convergence analysis provided, supported by numerical experiments. The experiments show excellent empir...
In this paper, a new optimal policy is introduced to determine, adapt, and protect the Generalized MultiProtocol Label Switching (GMPLS) network topology based on the current traffic load. The Integrated Traffic Engineering (ITE) paradigm provides mechanisms for dynamic addition of physical capacity to optical networks. In the absence of such mechanisms, the rejection of incoming requests may b...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید