نتایج جستجو برای: markov decision process

تعداد نتایج: 1627273  

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2022

In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches simulator real environment. It is therefore imperative utilize learn a robust policy for real-world deployment. work, we consider learning Robust Markov...

Journal: :Communications in computer and information science 2021

Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) probabilistic inference. Furthermore, we discuss offline methods under uncertainty. In an SSP MDP, is indefinite unknown a priori. MDPs generalize infinite are widely used in artificial ...

Journal: :IEEE Transactions on Automatic Control 2023

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. achieved selecting proper stage cost terminal for OCP. A very useful particular case OCP is Model Predictive (MPC) scheme where deterministic (possibly nonlinear...

2013
Thomas Dueholm Hansen

We give an introduction to infinite-horizon Markov decision processes (MDPs) with finite sets of states and actions. We focus primarily on discounted MDPs for which we present Shapley’s (1953) value iteration algorithm and Howard’s (1960) policy iteration algorithm. We also give a short introduction to discounted turn-based stochastic games, a 2-player generalization of MDPs. Finally, we give a...

2001
Brett L. Moore Todd M. Quasny Larry D. Pyeatt Eric D. Sinzinger

Partially Observable Markov Decision Processes (POMDPs) have been applied extensively to planning in environments where knowledge of an underlying process is confounded by unknown factors[3, 4, 7]. By applying the POMDP architecture to basic recognition tasks, we introduce a novel pattern recognizer that operates under partially observable conditions. This Single Action Partially Observable Mar...

Journal: :Annals OR 2015
Yanling Chang Alan L. Erera Chelsea C. White

The leader-follower partially observed, multi-objective Markov game (LF-POMG) models a sequential decision making situation with two intelligent and adaptive decision makers, a leader and a follower, each of which can choose actions that affect the dynamics of the system and where these actions are selected on the basis of current and past but possibly inaccurate state observations. The decisio...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید