نتایج جستجو برای: regret analysis
تعداد نتایج: 2828405 فیلتر نتایج به سال:
Note that (2) implies (1) since: if regret is high in expectation over problem instances, then there exists at least one problem instance with high regret. Also, (1) implies (2) if |F| is a constant. This can be seen as follows: suppose we know that for any algorithm we have high regret (say H) with one problem instance in F and low regret with all other instances in F , then, taking a uniform ...
Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this price function might vary from step to step. The decision maker has a buffer of bounded size for storing units of the commodity that can be used to satisfy demands at later points in time. We seek for an algorithm decidin...
We examine risk attitudes under regret theory and derive analytical expressions for two components—the resolution and regret premiums—of the risk premium under regret theory. We posit that regret-averse decision makers are risk seeking (resp., risk averse) for low (resp., high) probabilities of gains and that feedback concerning the forgone option reinforces risk attitudes. We test these hypoth...
We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions’ contexts and that the expected reward is an arbitrary linear function of the contexts’ images in the related repro...
Decision makers can become trapped by myopic regret avoidance in which rejecting feedback to avoid short-term outcome regret (regret associated with counterfactual outcome comparisons) leads to reduced learning and greater long-term regret over continuing poor decisions. In a series of laboratory experiments involving repeated choices among uncertain monetary prospects, participants primed with...
We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborativ...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce EXP3-COOP, a cooperative version of the EXP3 algorithm and prove that with K acti...
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of p...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید