Solving Finite-Horizon Discounted Non-Stationary MDPS
نویسندگان
چکیده
Abstract Research background Markov Decision Processes ( MDPs ) are a powerful framework for modeling many real-world problems with finite-horizons that maximize the reward given sequence of actions. Although such as investment and financial market where value decreases exponentially time, require introduction interest rates. Purpose This study investigates non-stationary finite-horizon discount factor to account fluctuations in rewards over time. methodology To consider authors define new nonstationary factor. First, existence an optimal policy proposed discounted is proven . Next, Discounted Backward Induction DBI algorithm presented find it. enhance their proposal, model used example MDP adaptive solve Results The method calculates values its expected total return consideration time money. Novelty No existing studies have before examined dynamic temporal rewards.
منابع مشابه
Lazy Approximation for Solving Continuous Finite-Horizon MDPs
Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...
متن کاملLazy Approximation: A New Approach for Solving Continuous Finite-Horizon MDPs
Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...
متن کاملOn the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes
We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...
متن کاملPAC Bounds for Discounted MDPs
We study upper and lower bounds on the sample-complexity of learning nearoptimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends line...
متن کاملTrial-Based Heuristic Tree Search for Finite Horizon MDPs
Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Folia Oeconomica Stetinensia
سال: 2023
ISSN: ['1898-0198', '1730-4237']
DOI: https://doi.org/10.2478/foli-2023-0001