We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein controller perceives distorted and noisy future reward, modeled by nonlinearity that accentuates gains under-represents losses relative to reference point. analyze asymptotic behavior scheme analyzing its limiting differential equation using theory monotone dyn...