Unimodal Bandits without Smoothness
نویسندگان
چکیده
We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward is a continuous and unimodal function of the arm. No further assumption is made regarding thesmoothness and the structure of the expected reward function. We propose Stochastic Pentachotomy(SP), an algorithm for which we derive finite-time regret upper bounds. In particular, we show that, forany expected reward function μ that behaves as μ(x) = μ(x)− C|x− x| locally around its maxi-mizer x for some ξ, C > 0, the SP algorithm is order-optimal, i.e., its regret scales asO(√T log(T ))when the time horizon T grows large. This regret scaling is achieved without the knowledge of ξ andC. Our algorithm is based on asymptotically optimal sequential statistical tests used to successivelyprune an interval that contains the best arm with high probability. To our knowledge, the SP algo-rithm constitutes the first sequential arm selection rule that achieves a regret scaling as O(√T ) up toa logarithmic factor for non-smooth expected reward functions, as well as for smooth functions withunknown smoothness.
منابع مشابه
Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope, 2009; Yu & Mannor, 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which ca...
متن کاملUnimodal Bandits: Regret Lower Bounds and Optimal Algorithms
We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope, 2009; Yu & Mannor, 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which ca...
متن کاملUnimodal Bandits
We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions...
متن کاملConsistency of the Maximum Product of Spacings Method and Estimation of a Unimodal Distribution
The first part of this paper gives some general consistency theorems for the maximum product of spacings (MPS) method, an estimation method related to maximum likelihood. The second part deals with nonparametric estimation of a concave (convex) distribution and more generally a unimodal distribution, without smoothness assumptions on the densities. The MPS estimator for a distribution function ...
متن کاملVerification Based Solution for Structured MAB Problems
We consider the problem of finding the best arm in a stochastic Multi-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the bes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1406.7447 شماره
صفحات -
تاریخ انتشار 2014