We consider a sequential assortment selection problem where the user choice is given by multinomial logit (MNL) model whose parameters are unknown. In each period, learning agent observes d-dimensional contextual information about and N available items, offers an of size K to user, bandit feedback item chosen from assortment. propose upper confidence bound based algorithms for this MNL bandit. ...