Ìì´µ Óòúööö× Ûûøø Èöóóóóóððøý ½ Èøøö Ýò Ìöööòòò Â Ëëëòóû×××
نویسندگان
چکیده
The methods of temporal di erences (Samuel, 1959; Sutton 1984, 1988) allow agents to learn accurate predictions about stationary stochastic future outcomes. The learning is e ectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal di erences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This paper proves the stronger result that the predictions of a slightly modi ed form of temporal di erence learning converge with probability one, and shows how to quantify the rate of convergence.