This paper examines, by argument, the dynamics of sequences of behavioural choices made, when non-cooperative restricted-memory agents learn in partially observable stochastic games. These sequences of combined agent strategies (joint-policies) can be thought of as a walk through the space of all possible joint-policies. We argue that this walk, while containing random elements, is also driven ...