Many sequential decision-making systems leverage data collected using prior policies to propose a new policy. For critical applications, it is important that high-confidence guarantees on the policy’s behavior are provided before deployment, ensure policy will behave as desired. Prior works have studied off-policy estimation of expected return, however, variance returns can be equally for high-...