We present a mean-variance policy iteration (MVPI) framework for risk-averse control in discounted infinite horizon MDP optimizing the variance of per-step reward random variable. MVPI enjoys great flexibility that any evaluation method and risk-neutral can be dropped off shelf, both on- off-policy settings. This reduces gap between is achieved by working on novel augmented directly. propose TD...