The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem remains central research in reinforcement learning artificial intelligence. Eligibility traces enable efficient recent sequence experienced by agent, but not counterfactual sequences that could also have led current state. In this work, we introduce expected ...