finite planning horizon

Effect of Reward Function Choices in MDPs with Value-at-Risk

2016

Shuai Ma Jia Yuan Yu

This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the centr...

متن کامل

A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

Journal: :Auton. Robots 2009

Ruben Martinez-Cantin Nando de Freitas Eric Brochu José A. Castellanos Arnaud Doucet

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is highdimen...

متن کامل

A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

2012

Thomas Furmston David Barber

Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being popular methods in this field. In this article we provide a unifying perspective of these two algorithms by showing that their searchdirections in the parameter space are closely related to the search-direction of...

متن کامل

MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs

2005

Daniel Szer François Charpillet Shlomo Zilberstein

We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partiallyobservable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, or distributed...

متن کامل

: A Software Toolbox for Receding Horizon Temporal Logic Planning

2010

Tichakorn Wongpiromsarn Ufuk Topcu Necmiye Ozay Huan Xu Richard M. Murray

This paper describes TuLiP, a Python-based software toolbox for the synthesis of embedded control software that is provably correct with respect to an expressive subset of linear temporal logic (LTL) specifications. TuLiP combines routines for (1) finite state abstraction of control systems, (2) digital design synthesis from LTL specifications, and (3) receding horizon planning. The underlying ...

متن کامل

Planning for robotic exploration based on forward simulation

Journal: :Robotics and Autonomous Systems 2016

Mikko Lauri Risto Ritala

A robotic agent is tasked to explore an a priori unknown environment. The objective is to maximize the amount of information about the partially observable state. The problem is formulated as a partially observable Markov decision process (POMDP) with an informationtheoretic objective function, further approximated to a form suitable for robotic exploration. An open-loop approximation is applie...

متن کامل

Competition under Capacitated Dynamic Lot Sizing with Capacity Acquisition

2006

Hongyan Li Joern Meissner

Lot-sizing and capacity planning are important supply chain decisions, and competition and cooperation affect the performance of these decisions. In this paper, we look into the dynamic lot sizing and resource competition problem of an industry consisting of multiple firms. A capacity competition model combining the complexity of time-varying demand with cost functions and economies os scale ar...

متن کامل

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

Journal: :Communications in computer and information science 2021

Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) probabilistic inference. Furthermore, we discuss offline methods under uncertainty. In an SSP MDP, is indefinite unknown a priori. MDPs generalize infinite are widely used in artificial ...

متن کامل

The Role of Time Preferences in the Intergenerational Transfer of Smoking

2014

Heather Brown Marjon van der Pol

Evidence suggests that maternal and offspring smoking behaviour is correlated. Little is known about the mechanisms through which this intergenerational transfer occurs. This paper explores the role of time preferences. Although time preference is likely to be heritable and correlated with health investments, its role in the intergenerational transmission of smoking has not been explored previo...

متن کامل

Design for an Optimal Probe

2003

Michael O. Duff

Given a Markov decision process (MDP) with expressed prior uncertainties in the process transition probabilities, we consider the problem of computing a policy that optimizes expected total (finite-horizon) reward. Implicitly, such a policy would effectively resolve the "exploration-versus-exploitation tradeoff" faced, for example, by an agent that seeks to optimize total reinforcement obtained...

متن کامل