Better Rates for Any Adversarial Deterministic MDP

نویسندگان

  • Ofer Dekel
  • Elad Hazan
چکیده

We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-theart forward in two ways: First, it attains a regret of O(T ) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T ). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SPAR: stochastic programming with adversarial recourse

We consider a general adversarial stochastic optimization model. Our model involves the design of a system that an adversary may subsequently attempt to destroy or degrade. We introduce SPAR, which utilizes mixed-integer programming for the design decision and a Markov decision process (MDP) for the modeling of our adversarial phase. © 2005 Elsevier B.V. All rights reserved.

متن کامل

99mTc-MDP bone scan guides in the identification of mesenteric vein thrombosis

A 50-year-old man with postprandial abdominal pain, weight loss, and generalized body ache was referred to Nuclear medicine department for a whole body bone scan to look for any malignancy. Clinical examination did not reveal any specific positive findings. He underwent aTechnetium-99m Methylene Diphosphonate (99mTc-MDP) bone scan which showed no obvious bone pathology. But there was...

متن کامل

Intelligent Planning: A Markov Decision Process (MDP) Approach to Account for the Adversary*

A. Objectives The problem of planning under uncertainty is one of the most important elements of a successful operation. In this context, planning that only accounts for a static, preconceived adversary will not suffice. Instead, an analysis of evolving enemy’s centers of gravity and the available means of attacking those centers is necessary. This latter approach provides better estimates of t...

متن کامل

Saturated Path-Constrained MDP: Planning under Uncertainty and Deterministic Model-Checking Constraints

In many probabilistic planning scenarios, a system’s behavior needs to not only maximize the expected utility but also obey certain restrictions. This paper presents Saturated PathConstrained Markov Decision Processes (SPC MDPs), a new MDP type for planning under uncertainty with deterministic model-checking constraints, e.g., “state s must be visited before s′”, ”the system must end up in s”, ...

متن کامل

Enforceable Quality of Service Guarantees for Bursty Traffic Streams

Providing statistical quality-of-service guarantees introduces the conflicting requirements for both deterministic trafic models to isolate and police users and statistical multiplexing to ejiciently utilize and share network resources. We address this issue by introducing two schemes for providing statistical services to deterministically policed sources: (1) adversarial mode resource allocati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013