نتایج جستجو برای: bellman
تعداد نتایج: 4956 فیلتر نتایج به سال:
We show that competitive equilibria in a range of models related to production networks can be recovered as solutions dynamic programs. Although these programs fail contractive, we prove they are tractable. As an illustration, treat Coase's theory the firm, chains with transaction costs, and multiple partners. then how same techniques extend other equilibrium decision problems, such distributio...
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal value function, at some states there will be a discrepancy, which is natural to call the Bellman residual, between what the value function speciies at that state and what is obtained by a one-step lo...
This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual ‖T∗vπ − vπ...
Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Critically, the learner cannot in general know a priori the relevant time scale over which meaningful relationships will be observed. Widely used reinforcement learning algorithms discretize continuous time and use the Bellman equation to estimate exponentially-discounted future reward. ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید