Efficient strategy iteration for mean payoff in Markov decision processes

DOI10.1007/978-3-319-68167-2_25MaRDI QIDQ5096097zbMATH OpenOpenAlexFDO

Authors Jan Křetínský, Tobias Meggendorfer

Publication date 12 August 2022

Published in Automated Technology for Verification and Analysis (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1707.01859

Probability in computer science (algorithm analysis, random structures, phase transitions, etc.) (68Q87) Applications of game theory (91A80) Markov and semi-Markov decision processes (90C40)

Abstract: Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.

Recommendations

Cited in

(10)

Describes a project that uses

Uses Software

PRISM

This page was built for publication: Efficient strategy iteration for mean payoff in Markov decision processes

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5096097)