–Thompson sampling is a heuristic algorithm for the multi-armed bandit problem which has long tradition in machine learning. The Bayesian spirit sense that it selects arms based on posterior samples of reward probabilities each arm. By forging connection between combinatorial binary bandits and spike-and-slab variable selection, we propose stochastic optimization approach to subset selection ca...