Shortest Path under Uncertainty: Exploration versus Exploitation

نویسندگان

  • Zhan Wei Lim
  • David Hsu
  • Wee Sun Lee
چکیده

In the Canadian Traveler Problem (CTP), a traveler seeks a shortest path to a destination through a road network, but unknown to the traveler, some roads may be blocked. This paper studies the Bayesian CTP (BCTP), in which road states are correlated with known prior probabilities and the traveler can infer the states of an unseen road from past observations of other correlated roads. As generalized shortest-path problems, CTP and BCTP have important practical applications. We show that BCTP is NP-complete and give a polynomial-time approximation algorithm, Hedged Shortest Path under Determinization (HSPD), which approximates an optimal solution with a polylogarithmic factor. Preliminary experiments show promising results. HSPD outperforms a widely used greedy algorithm and a state-of-the-art UCT-based search algorithm for CTP, especially when significant exploration is required.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tuning continual exploration in reinforcement learning: An optimality property of the Boltzmann strategy

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action in that state. Then, the exploration/exploitation tradeoff is formulated as a globa...

متن کامل

Managing the Exploration/Exploitation Trade-Off in Reinforcement Learning

In this work, we present a model that integrates both exploration and exploitation in a common framework. First of all, we define the concept of degree of exploration from a state as the entropy of the probability distribution on the set of admissible actions in this state. This entropy value allows to control the degree of exploration linked to this state, and should be provided by the user. T...

متن کامل

A Modified Genetic Algorithm for Finding Fuzzy Shortest Paths in Uncertain Networks

In realistic network analysis, there are several uncertainties in the measurements and computation of the arcs and vertices. These uncertainties should also be considered in realizing the shortest path problem (SPP) due to the inherent fuzziness in the body of expert's knowledge. In this paper, we investigated the SPP under uncertainty to evaluate our modified genetic strategy. We improved the ...

متن کامل

Optimal Tuning of Continual Online Exploration in Reinforcement Learning

This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the ex...

متن کامل

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration.

Many large and small decisions we make in our daily lives-which ice cream to choose, what research projects to pursue, which partner to marry-require an exploration of alternatives before committing to and exploiting the benefits of a particular choice. Furthermore, many decisions require re-evaluation, and further exploration of alternatives, in the face of changing needs or circumstances. Tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017