Solving Finite-Horizon Discounted Non-Stationary MDPS

نویسندگان

چکیده

Abstract Research background Markov Decision Processes ( MDPs ) are a powerful framework for modeling many real-world problems with finite-horizons that maximize the reward given sequence of actions. Although such as investment and financial market where value decreases exponentially time, require introduction interest rates. Purpose This study investigates non-stationary finite-horizon discount factor to account fluctuations in rewards over time. methodology To consider authors define new nonstationary factor. First, existence an optimal policy proposed discounted is proven . Next, Discounted Backward Induction DBI algorithm presented find it. enhance their proposal, model used example MDP adaptive solve Results The method calculates values its expected total return consideration time money. Novelty No existing studies have before examined dynamic temporal rewards.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lazy Approximation for Solving Continuous Finite-Horizon MDPs

Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems, the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their tra...

متن کامل

Lazy Approximation: A New Approach for Solving Continuous Finite-Horizon MDPs

متن کامل

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. We consider the algorithm Value Iteration and the sequence of policies π1, . . . , πk it implicitely generates until some iteration k. We provide performance bounds for non-stationary policies involving the last m generated policies that reduce the state-of-t...

متن کامل

PAC Bounds for Discounted MDPs

We study upper and lower bounds on the sample-complexity of learning nearoptimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends line...

متن کامل

Trial-Based Heuristic Tree Search for Finite Horizon MDPs

Dynamic programming is a well-known approach for solving MDPs. In large state spaces, asynchronous versions like Real-Time Dynamic Programming (RTDP) have been applied successfully. If unfolded into equivalent trees, Monte-Carlo Tree Search algorithms are a valid alternative. UCT, the most popular representative, obtains good anytime behavior by guiding the search towards promising areas of the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Folia Oeconomica Stetinensia

سال: 2023

ISSN: ['1898-0198', '1730-4237']

DOI: https://doi.org/10.2478/foli-2023-0001