نتایج جستجو برای: s policy

تعداد نتایج: 960898  

Journal: :CoRR 2017
Kamil Ciosek Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampled trajectory. We establish a new general policy gradient theorem, of which the stochastic and de...

2007
Robby Goetschalckx Jan Ramon

We consider the problem of policy learning in aMarkov Decision Process (MDP) where only a restricted, limited subset of the full policy space can be used. A MDP consists of a state space S, a set of actions A, a transition probability function t(s, a, s′) and a reward function R : S → R. Also there is the discount factor γ. The problem is to find a policy, a mapping from states to actions π : S...

Journal: :مطالعات اوراسیای مرکزی 0
مسعود موسوی شفائی استادیار گروه روابط بین الملل، دانشکده علوم انسانی، دانشگاه تربیت مدرس محمد تقی بازگرد دانشجوی دکتری روابط بین الملل دانشکده علوم انسانی، دانشگاه تربیت مدرس

after collapse of the soviet :union:, russia is faced with strategic vaccum in its foreign policy for a time. thus, we were witnessing a period of unstable, inactive and reactionary policy- making in russian foreign policy while the us was expanding its influence on the russian periphery. but in putin's administration, the russian foreign policy has evolved dramatically. putin put the infl...

2004
Katia Sycara David Martin Deborah L. McGuinness Sheila McIlraith Massimo Paolucci Tim Finin Grit Denker Lalana Kagal

Description of constraints and capabilities of Web services is crucial to enable flexible service discovery, interaction and management. Constraints and capabilities are at the heart of representing policies that govern the behavior of Web services. Because policies regulate the behavior of different parties they are effective if and only if they are unambiguously understood by all the parties ...

2013
Ricardo J. Caballero Emmanuel Farhi

The global economy has a chronic shortage of safe assets which lies behind many recent macroeconomic imbalances. This paper provides a simple model of the Safe Asset Mechanism (SAM), its recessionary safety traps, and its policy antidotes. Safety traps share many common features with conventional liquidity traps, but also exhibit important differences, in particular with respect to their reacti...

Journal: :Bulletin of Taras Shevchenko National University of Kyiv. History 2019

2004
Lihong Li Vadim Bulitko Russell Greiner

We investigate the problem of using function approximation in reinforcement learning where the agent’s policy is represented as a classifier mapping states to actions. High classification accuracy is usually deemed to correlate with high policy quality. But this is not necessarily the case as increasing classification accuracy can actually decrease the policy’s quality. This phenomenon takes pl...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید