نتایج جستجو برای: markov decision process graph theory
تعداد نتایج: 2385831 فیلتر نتایج به سال:
In sequential decision making under uncertainty, as in many other modeling endeavors, researchers observe a dynamical system and collect data measuring its behavior over time. These data are often used to build models that explain relationships between the measured variables, and are eventually used for planning and control purposes. However, these measurements can not always be exact, systems ...
A bandit problem with side observations is an extension of the traditional two-armed bandit problem, in which the decision maker has access to side information before deciding which arm to pull. In this paper, essential properties of the side observations that allow achievability results with respect to optimal regret are extracted and formalized. The sufficient conditions for good side informa...
Maintenance decision-making is essential to achieve safe and reliable operation with high performance for equipment. To avoid unexpected shutdown increase machine life as well system efficiency, it fundamental design an effective maintenance scheme In this paper, we propose a novel method equipment based on Long Short-Term Memory (LSTM) Markov decision process, which can provide specific strate...
Bohm–Bell processes, of interest in the foundations of quantum field theory, form a class of Markov processes Qt generalizing in a natural way both Bohm’s dynamical system in configuration space for nonrelativistic quantum mechanics and Bell’s jump process for lattice quantum field theories. They are such that at any time t the distribution of Qt is |ψt| 2 with ψ the wave function of quantum th...
This paper considers the verification of continuous-time Markov decision process (CTMDPs) against single-clock deterministic timed automata (DTA) specifications. The central issue is to compute the maximum probability of the set of timed paths of a CTMDP C that are accepted by a DTA A. We show that this problem can be reduced to a linear programming problem whose coefficients are maximum timed ...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alo...
We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to ...
In the course lectures, we have discussed a lot regarding unconstrained Markov Decision Process (MDP). The dynamic programming decomposition and optimal policies with MDP are also given. However, in this report we are going to discuss a different MDP model, which is constrained MDP. There are many realistic demand of studying constrained MDP. For instance, in the wireless sensors networks, each...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید