نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
Deglobalization, as opposed to the term globalization, appears in world order due local solutions problems and border controls, ignoring principles of treaties, trade wars, expansion regionalism. In addition, slowbalization helps shrink global flow trade, information, societal cultural exchange dynamism. However, this scary order, triggered by deglobalization slowbalization, significantly impac...
We use single-agent and multi-agent Reinforcement Learning (RL) for learning dialogue policies in a resource allocation negotiation scenario. Two agents learn concurrently by interacting with each other without any need for simulated users (SUs) to train against or corpora to learn from. In particular, we compare the Qlearning, Policy Hill-Climbing (PHC) and Win or Learn Fast Policy Hill-Climbi...
OBJECTIVES Measuring and monitoring health system performance is important albeit controversial. Technical, logistic and financial challenges are formidable. We introduced a system of measurement, which we call Q, to measure the quality of hospital clinical performance across a range of facilities. This paper describes how Q was developed, implemented in hospitals in the Philippines and how it ...
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods t...
Keqi Yan: Fluid Models for Production-Inventory Systems (Under the direction of Professor Vidyadhar G. Kulkarni) We consider a single stage production-inventory system whose production and demand rates are modulated by a finite state Markov chain called the environment. Supplementary orders can be placed from external suppliers when needed. We model this system by a fluid-flow system and derive...
The control of a stochastic manufacturing system that executes capital asset repairs and remanufacturing in an integrated system is examined. The remanufacturing resources respond to planned returns of worn-out equipments at the end of their expected life and unplanned returns triggered by major equipment failures. Remanufacturing operations for planned demand can be executed at different rates...
We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms fo...
This paper investigates an approach to designing and building adaptive agents. The main contribution is the use of a symbolic machine learning system for approximating the policy and Q functions that are at the heart of the agent. Under the assumption that sufficient knowledge of the application domain is available, it is shown how this knowledge can be provided to the agent in the form of symb...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید