policy option

Learnings Options End-to-End for Continuous Action Tasks

Journal: :CoRR 2017

Martin Klissarov Pierre-Luc Bacon Jean Harb Doina Precup

We present new results on learning temporally extended actions for continuous tasks, using the options framework (Sutton et al. [1999b], Precup [2000]). In order to achieve this goal we work with the option-critic architecture (Bacon et al. [2017]) using a deliberation cost and train it with proximal policy optimization (Schulman et al. [2017]) instead of vanilla policy gradient. Results on Muj...

متن کامل

policy capacity for health reform: necessary but insufficient; comment on “health reform requires policy capacity”

Journal: :international journal of health policy and management 2016

owen adams

forest and colleagues have persuasively made the case that policy capacity is a fundamental prerequisite to health reform. they offer a comprehensive life-cycle definition of policy capacity and stress that it involves much more than problem identification and option development. i would like to offer a canadian perspective. if we define health reform as re-orienting the health system from acut...

متن کامل

The Option-Critic Architecture

2017

Pierre-Luc Bacon Jean Harb Doina Precup

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new ...

متن کامل

Irreversibility and Uncertainty in Species Valuation

2009

Raul Acevedo Amy Weaver

This paper incorporates an option value into deforestation policy analysis. Similar to an option value in finance, the option value here reflects the advantage to delaying irreversible species extinction until more information about the uncertain value of species is known. The return from species is modeled as a stochastic flow of benefits which ceases if policy makers choose to deforest. Defor...

متن کامل

Unified Inter and Intra Options Learning Using Policy Gradient Methods

2011

Kfir Y. Levy Nahum Shimkin

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first lear...

متن کامل

GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

2010

Hamid Reza Maei Richard S. Sutton

A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(λ) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and off-policy learning of temporally abstract predictions. These ...

متن کامل

IMF Staff Papers vol. 51, no. 3

2004

HELGE BERGER

The “conservative central banker” has come under attack recently. Explicitly modeling the interaction of a trade union with monetary policy, it has been argued that the standard solution to the inflationary bias in monetary policy might actually be welfare-reducing if the trade union has an exogenous preference against inflation. We reframe this discussion in a standard trade union model. We sh...

متن کامل

Using Newborn Screening Bloodspots for Research: Public Preferences for Policy Options.

Journal: :Pediatrics 2016

Robin Z Hayeems Fiona A Miller Carolyn J Barg Yvonne Bombard Celine Cressman Michael Painter-Main Brenda Wilson Julian Little Judith Allanson Denise Avard Yves Giguere Pranesh Chakraborty June C Carroll

OBJECTIVES Retaining residual newborn screening (NBS) bloodspots for medical research remains contentious. To inform this debate, we sought to understand public preferences for, and reasons for preferring, alternative policy options. METHODS We assessed preferences among 4 policy options for research use of residual bloodspots through a bilingual national Internet survey of a representative s...

متن کامل

Learning with Options that Terminate Off-Policy

Journal: :CoRR 2017

Anna Harutyunyan Peter Vrancx Pierre-Luc Bacon Doina Precup Ann Nowé

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optim...

متن کامل

Dispensing with NAFTA Rules of Origin? Some Policy Options for Canada

2009

Patrick Georges

Increased market access from Free Trade Agreements (FTAs) promised by policy makers is often diluted by preferential rules of origin (ROO). This paper discusses two policy options -one direct, and one indirect -with regard to limiting the impact of NAFTA ROO on trade, and illustrates the impact on GDP and welfare of these options using a computable general equilibrium methodology. The first (di...

متن کامل