Cognitive Science Honors Thesis A Computational Account of Sensory Prediction Error Gating in Reinforcement Learning Models
نویسندگان
چکیده
A successful return in tennis requires a tennis player first to determine where best to place her return and then to correctly execute her swing. If she makes an errant return, she now faces a credit assignment problem: Should this negative outcome be attributed to poor shot selection or to an error in motor execution? McDougle et al. propose a solution to this problem when the source of the error is the motor system. They posit that motor errors are communicated to the decision-making system whereby they gate learning in order to prevent the undesired negative reinforcement of the chosen action. This gating hypothesis was motivated by recent anatomical evidence that the cerebellum — a crucial node in a network widely thought to process motor execution errors — sends direct subcortical projections to the basal ganglia — a crucial node in a network widely thought to drive reinforcement learning and decision-making. In McDougle et al.’s gating model, motor execution errors scale learning rates in a temporal difference (TD) reinforcement learning model of decision-making. However, the most prominent attempts to link the basal ganglia to reinforcement learning models have instead suggested that actor-critic (AC) models may be more appropriate models of basal ganglia anatomy. In the present study, we investigate the gating hypothesis from the perspective of AC models. We find that AC models can account for McDougle et al.’s behavioral results, but that gating is necessary for them to do so. Additionally, we find that simultaneous gating of the actor and critic is the only AC gating model that can account for subject data, but that AC models do not account for subject data better than TD models. We conclude with a discussion of the biological plausibility of the proposed gating mechanism from the perspective of the AC gating model.
منابع مشابه
Cognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development
Adolescence is associated with quickly changing environmental demands which require excellent adaptive skills and high cognitive flexibility. Feedback-guided adaptive learning and cognitive flexibility are driven by reward prediction error (RPE) signals, which indicate the accuracy of expectations and can be estimated using computational models. Despite the importance of cognitive flexibility d...
متن کاملMechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.
Growing evidence suggests that the prefrontal cortex (PFC) is organized hierarchically, with more anterior regions having increasingly abstract representations. How does this organization support hierarchical cognitive control and the rapid discovery of abstract action rules? We present computational models at different levels of description. A neural circuit model simulates interacting cortico...
متن کاملConfirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing
Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the ...
متن کاملIndividual differences and the neural representations of reward expectation and reward prediction error.
Reward expectation and reward prediction errors are thought to be critical for dynamic adjustments in decision-making and reward-seeking behavior, but little is known about their representation in the brain during uncertainty and risk-taking. Furthermore, little is known about what role individual differences might play in such reinforcement processes. In this study, it is shown behavioral and ...
متن کاملImplicit Value Updating Explains Transitive Inference Performance: The Betasort Model
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognit...
متن کامل