The adversarial stochastic shortest path problem with unknown transition probabilities

نویسندگان

Gergely Neu

András György

Csaba Szepesvári

چکیده

We consider online learning in a special class of episodic Markovian decision processes, namely, loop-free stochastic shortest path problems. In this problem, an agent has to traverse through a finite directed acyclic graph with random transitions while maximizing the obtained rewards along the way. We assume that the reward function can change arbitrarily between consecutive episodes, and is entirely revealed to the agent at the end of each episode. Previous work was concerned with the case when the stochastic dynamics is known ahead of time, whereas the main novelty of this paper is that this assumption is lifted. We propose an algorithm called “follow the perturbed optimistic policy” that combines ideas from the “follow the perturbed leader” method for online learning of arbitrary sequences and “upper confidence reinforcement learning”, an algorithm for regret minimization in Markovian decision processes (with a fixed reward function). We prove that the expected cumulative regret of our algorithm is of order L|X ||A| √ T up to logarithmic factors, where L is the length of the longest path in the graph, X is the state space, A is the action space and T is the number of episodes. To our knowledge this is the first algorithm that learns and controls stochastic and adversarial components in an online fashion at the same time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Online Loop-free Stochastic Shortest-Path Problem

We consider a stochastic extension of the loop-free shortest path problem with adversarial rewards. In this episodic Markov decision problem an agent traverses through an acyclic graph with random transitions: at each step of an episode the agent chooses an action, receives some reward, and arrives at a random next state, where the reward and the distribution of the next state depend on the act...

متن کامل

Dynamic Multi Period Production Planning Problem with Semi Markovian Variable Cost (TECHNICAL NOTE)

This paper develops a method for solving the single product multi-period production-planning problem, in which the production and the inventory costs of each period arc concave and backlogging is not permitted. It is also assumed that the unit variable cost of the production evolves according to a continuous time Markov process. We prove that this production-planning problem can be Stated as a ...

متن کامل

Robust Planning with (L)RTDP

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case...

متن کامل

Planning with Robust (L)RTDP

متن کامل

Near Optimal Adaptive Shortest Path Routing with Stochastic Links States under Adversarial Attack

We consider the shortest path routing (SPR) of a network with stochastically time varying link metrics under potential adversarial attacks. Due to potential denial of service attacks, the distributions of link states could be stochastic (benign) or adversarial at different temporal and spatial locations. Without any a priori, designing an adaptive SPR protocol to cope with all possible situatio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

The adversarial stochastic shortest path problem with unknown transition probabilities

نویسندگان

چکیده

منابع مشابه

The Online Loop-free Stochastic Shortest-Path Problem

Dynamic Multi Period Production Planning Problem with Semi Markovian Variable Cost (TECHNICAL NOTE)

Robust Planning with (L)RTDP

Planning with Robust (L)RTDP

Near Optimal Adaptive Shortest Path Routing with Stochastic Links States under Adversarial Attack

عنوان ژورنال:

اشتراک گذاری