نتایج جستجو برای: rewards
تعداد نتایج: 16325 فیلتر نتایج به سال:
Previous research has shown that the value of large future rewards is discounted less steeply than is the value of small future rewards. These experiments extended this line of research to probabilistic rewards. Two experiments replicated the standard findings for delayed rewards but demonstrated that amount has an opposite effect on the discounting of probabilistic rewards. That is, large prob...
Laboratory studies of choice and decision making among real monetary rewards typically use smaller real rewards than those common in real life. When laboratory rewards are large, they are almost always hypothetical. In applying laboratory results meaningfully to real-life situations, it is important to know the extent to which choices among hypothetical rewards correspond to choices among real ...
چکیده ندارد.
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید