نتایج جستجو برای: policy option

تعداد نتایج: 333995  

Journal: :CoRR 2017
Martin Klissarov Pierre-Luc Bacon Jean Harb Doina Precup

We present new results on learning temporally extended actions for continuous tasks, using the options framework (Sutton et al. [1999b], Precup [2000]). In order to achieve this goal we work with the option-critic architecture (Bacon et al. [2017]) using a deliberation cost and train it with proximal policy optimization (Schulman et al. [2017]) instead of vanilla policy gradient. Results on Muj...

Journal: :international journal of health policy and management 2016
owen adams

forest and colleagues have persuasively made the case that policy capacity is a fundamental prerequisite to health reform. they offer a comprehensive life-cycle definition of policy capacity and stress that it involves much more than problem identification and option development. i would like to offer a canadian perspective. if we define health reform as re-orienting the health system from acut...

2017
Pierre-Luc Bacon Jean Harb Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...

2009
Raul Acevedo Amy Weaver

This paper incorporates an option value into deforestation policy analysis. Similar to an option value in finance, the option value here reflects the advantage to delaying irreversible species extinction until more information about the uncertain value of species is known. The return from species is modeled as a stochastic flow of benefits which ceases if policy makers choose to deforest. Defor...

2011
Kfir Y. Levy Nahum Shimkin

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first lear...

2010
Hamid Reza Maei Richard S. Sutton

A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(λ) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and off-policy learning of temporally abstract predictions. These ...

2004
HELGE BERGER

The “conservative central banker” has come under attack recently. Explicitly modeling the interaction of a trade union with monetary policy, it has been argued that the standard solution to the inflationary bias in monetary policy might actually be welfare-reducing if the trade union has an exogenous preference against inflation. We reframe this discussion in a standard trade union model. We sh...

Journal: :Pediatrics 2016
Robin Z Hayeems Fiona A Miller Carolyn J Barg Yvonne Bombard Celine Cressman Michael Painter-Main Brenda Wilson Julian Little Judith Allanson Denise Avard Yves Giguere Pranesh Chakraborty June C Carroll

OBJECTIVES Retaining residual newborn screening (NBS) bloodspots for medical research remains contentious. To inform this debate, we sought to understand public preferences for, and reasons for preferring, alternative policy options. METHODS We assessed preferences among 4 policy options for research use of residual bloodspots through a bilingual national Internet survey of a representative s...

Journal: :CoRR 2017
Anna Harutyunyan Peter Vrancx Pierre-Luc Bacon Doina Precup Ann Nowé

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optim...

2009
Patrick Georges

Increased market access from Free Trade Agreements (FTAs) promised by policy makers is often diluted by preferential rules of origin (ROO). This paper discusses two policy options -one direct, and one indirect -with regard to limiting the impact of NAFTA ROO on trade, and illustrates the impact on GDP and welfare of these options using a computable general equilibrium methodology. The first (di...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید