In classical multiarmed bandit problem, the aim is to find a policy maximizing expected total reward, implicitly assuming that decision-maker risk-neutral. On other hand, decision-makers are risk-averse in some real-life applications. this article, we design new setting based on concept of dynamic risk measures where with best risk-adjusted discounted outcome. We provide theoretical analysis pr...