1. Proofs In this section we derive the proofs of all propositions in the main paper. Proposition 1. The AD entropy of the generalized distribution of y can be written as the sum of the negative log-likelihood of y and the AD entropy of the conditional distribution of the hidden variable given the output, Hα,β(Q y x ;w) = − logP (y|x,w) +Hα,β(P y x ;w). (1) Proof. The AD entropy of the generali...