Long-Run Rewards for Markov Automata

نویسندگان

Yuliya Butkova

Ralf Wimmer

Holger Hermanns

چکیده

Markov automata are a powerful formalism for modelling systems which exhibit nondeterminism, probabilistic choices and continuous stochastic timing. We consider the computation of long-run average rewards, the most classical problem in continuous-time Markov model analysis. We propose an algorithm based on value iteration. It improves the state of the art by orders of magnitude. The contribution is rooted in a fresh look on Markov automata, namely by treating them as an efficient encoding of CTMDPs with – in the worst case – exponentially more transitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling and Analysis of Markov Reward Automata

Costs and rewards are important ingredients for many types of systems, modelling critical aspects like energy consumption, task completion, repair costs, and memory usage. This paper introduces Markov reward automata, an extension of Markov automata that allows the modelling of systems incorporating rewards (or costs) in addition to nondeterminism, discrete probabilistic choice and continuous s...

متن کامل

Extending Markov Automata with State and Action Rewards∗

This presentation introduces the Markov Reward Automaton (MRA), an extension of the Markov automaton that allows the modelling of systems incorporating rewards in addition to nondeterminism, discrete probabilistic choice and continuous stochastic timing. Our models support both rewards that are acquired instantaneously when taking certain transitions (action rewards) and rewards that are based ...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Performance evaluation with temporal rewards

Today many formalisms exist for specifying complex Markov chains. In contrast, formalisms for specifying rewards, enabling the analysis of long-run average performance properties, have remained quite primitive. Basically, they only support the analysis of relatively simple performance metrics that can be expressed as long-run averages of atomic rewards, i.e. rewards that are deductible directly...

متن کامل