نتایج جستجو برای: internal rewards
تعداد نتایج: 248018 فیلتر نتایج به سال:
Laboratory studies of choice and decision making among real monetary rewards typically use smaller real rewards than those common in real life. When laboratory rewards are large, they are almost always hypothetical. In applying laboratory results meaningfully to real-life situations, it is important to know the extent to which choices among hypothetical rewards correspond to choices among real ...
چکیده ندارد.
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worst-case regret of Exp3 (Auer et al., 2002b) and the (poly)logarithmic regret of UCB1 (Auer et al., 2002a) for stochastic rewards. Adversarial rewards and stochastic rewards are the two...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید