نتایج جستجو برای: modified thompson
تعداد نتایج: 257891 فیلتر نتایج به سال:
The fish abundance index over an ocean region is defined here to be the integral of expected catch per unit effort (CPUE), approximated by the sum of expected CPUE over grid squares. When trawl surveys are done within grid squares selected according to a probability sampling design, several other sources of variation such as the fish population dynamics and the catching process are also involve...
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intractable planning problem reduces to a simple multi-armed bandit problem, where each lever stands for a potentially arbitrarily complex policy. Furthermore, we use the Bayesian control rule to construct an adaptive band...
The following lemma is implied by Theorem 1 in Abbasi-Yadkori et al. (2011): Lemma 7. (Abbasi-Yadkori et al., 2011) Let (F ′ t; t ≥ 0) be a filtration, (mt; t ≥ 1) be an R-valued stochastic process such that mt is (F ′ t−1)-measurable, (ηt; t ≥ 1) be a real-valued martingale difference process such that ηt is (F ′ t)-measurable. For t ≥ 0, define ξt = ∑t τ=1mτητ and Mt = Id + ∑t τ=1mτm T τ , wh...
In this paper, we extend the SHIFT-AND approach by BaezaYates and Gonnet (CACM 35(10), 1992) to the matching problem for network expressions, which are regular expressions without Kleene-closure and useful in applications such as bioinformatics and event stream processing. Following the study of Navarro (RECOMB, 2001) on the extended string matching, we introduce new operations called Scatter, ...
The multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent’s goal is not only to find that set but also to play evenly or fairly the arms in that set....
Consider the problem of learning a parametric distribution from observations. A frequentist approach to learning considers parameters to be fixed, and uses the data learn those parameters as accurately as possible. For example, consider the problem of learning Bernoulli distribution’s parameter ( a random variable is distributed as Bernoulli(μ) is 1 with probability μ and 0 with probability 1 −...
Kurt Z. Long, Jose Ignacio Santos, Jorge L. Rosado, Catalina Lopez-Saucedo, Rocio Thompson-Bonilla, Maricela Abonce, Herbert L. DuPont, Ellen Hertzmark, and Teresa Estrada-Garcia Department of Nutrition and Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, and University of Texas Medical School and School of Public Health, Houston; Hospital Infantil de Mexico F...
Although the word “dog” and an unambiguous barking sound may point to the same concept DOG, verbal labels and nonverbal cues appear to activate conceptual information in systematically different ways (Lupyan & Thompson-Schill, 2012). Here we investigate these differences in more detail. We replicate the finding that labels activate a more prototypical representation than do sounds, and find tha...
Thompson sampling has impressive empirical performance for many multi-armed bandit problems. But current algorithms for Thompson sampling only work for the case of conjugate priors since these algorithms require to infer the posterior, which is often computationally intractable when the prior is not conjugate. In this paper, we propose a novel algorithm for Thompson sampling which only requires...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید