In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and preferences of user, in order quickly zoom on those decisions that are likely be preferred by user. this paper we study MORL context multi-armed bandits. Contrary earlier approaches force utility user expressed as a weighted sum values for each objective, do not make such...