Minimizing Wide Range Regret with Time Selection Functions
نویسندگان
چکیده
We consider the problem of minimizing regret with respect to a given set S of pairs of time selection functions and modifications rules. We give an online algorithm that has O( √ T log |S|) regret with respect to S when the algorithm is run for T time steps and there are N actions allowed. This improves the upper bound of O( √ TN log(|I||F|)) given by Blum and Mansour [BM07a] for the case when S = I × F for a set I of time selection functions and a set F of modification rules. We do so by giving a simple reduction that uses an online algorithm for external regret as a black box.
منابع مشابه
A Probabilistic Model for Minmax Regret in Combinatorial Optimization
In this paper, we propose a probabilistic model for minimizing the anticipated regret in combinatorial optimization problems with distributional uncertainty in the objective coefficients. The interval uncertainty representation of data is supplemented with information on the marginal distributions. As a decision criterion, we minimize the worst-case conditional value-at-risk of regret. The prop...
متن کاملOnline Learning with Transductive Regret
We study online learning with the general notion of transductive regret, that is regret with modification rules applying to expert sequences (as opposed to single experts) that are representable by weighted finite-state transducers. We show how transductive regret generalizes existing notions of regret, including: (1) external regret; (2) internal regret; (3) swap regret; and (4) conditional sw...
متن کاملMinimizing Simple and Cumulative Regret in Monte-Carlo Tree Search
Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the...
متن کاملStochastic Contextual Bandits with Known Reward Functions
Many sequential decision-making problems in communication networks such as power allocation in energy harvesting communications, mobile computational offloading, and dynamic channel selection can be modeled as contextual bandit problems which are natural extensions of the well-known multi-armed bandit problem. In these problems, each resource allocation or selection decision can make use of ava...
متن کاملStrongly Adaptive Regret Implies Optimally Dynamic Regret
To cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret independently. In this paper, we illustrate an intrinsic connection between these two concepts by showing that the dynamic regret can be expressed in terms of the adaptive regret and the functional variation. This observation implies that strongly adaptiv...
متن کامل