Maximal Average-Reward Policies for Semi-Markov Decision Processes With Arbitrary State and Action Space
نویسندگان
چکیده
منابع مشابه
Markov Decision Processes with Arbitrary Reward Processes
We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical no-regret result for repeated games. Specif...
متن کاملPseudometrics for State Aggregation in Average Reward Markov Decision Processes
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...
متن کاملThe Policy Iteration Algorithm for Average Reward Markov Decision Processes with General State Space
The average cost optimal control problem is addressed for Markov decision processes with unbounded cost. It is found that the policy iteration algorithm generates a sequence of policies which are c-regular (a strong stability condition), where c is the cost function under consideration. This result only requires the existence of an initial c-regular policy and an irreducibility condition on the...
متن کاملAverage-Reward Decentralized Markov Decision Processes
Formal analysis of decentralized decision making has become a thriving research area in recent years, producing a number of multi-agent extensions of Markov decision processes. While much of the work has focused on optimizing discounted cumulative reward, optimizing average reward is sometimes a more suitable criterion. We formalize a class of such problems and analyze its characteristics, show...
متن کاملl AVERAGE COST SEMI - MARKOV DECISION PROCESSES
^ The Semi-Markov Decision model is considered under the criterion of long-run average cost. A new criterion, which for any policy considers the limit of the expected cost Incurred during the first n transitions divided by the expected length of the first n transitions, is considered. Conditions guaranteeing that an optimal stationary (nonrandomized) policy exist are then presented. It is also ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Mathematical Statistics
سال: 1971
ISSN: 0003-4851
DOI: 10.1214/aoms/1177693170