1 Markov Decision Processes Consider a Markov chain (w(k), a(k)) defined for k = 0, 1,. .. and with w(k) ∈ W, a(k) in A, where W and A are finite sets representing the system state space and the action space, respectively. The transition probabilities are defined by the function P θ (w , a , w, a) = Pr w(k + 1) = w, a(k + 1) = a| w(k) = w , a(k) = a. Here, θ ∈ R N is a vector of policy paramete...