Automated agents designed to learn strategies using Markov decision processes on a continuous state space usually need to approximate the value function associated with the environment. Traditionally, as described in Sutton and Barto in [1], this is often done using a fixed number of rectangular features tiled across the state space, possibly distributed into multiple layers. As an improvement ...