The second last inequality follows from the observation that the event E i (t) was defined as μ̂i(t) > xi, At time τk+1 for k ≥ 1, μ̂i(τk+1) = Si(τk+1) k+1 ≤ Si(τk+1) k , where latter is simply the average of the outcomes observed from k i.i.d. plays of arm i, each of which is a Bernoulli trial with mean μi. Using Chernoff-Hoeffding bounds (Fact 1), we obtain that Pr(μ̂i(τk + 1) > xi) ≤ Pr(ik k > ...