Efficient strategy iteration for mean payoff in Markov decision processes

From MaRDI portal
Publication:5096097

DOI10.1007/978-3-319-68167-2_25zbMATH Open1495.68152arXiv1707.01859OpenAlexW2728937946MaRDI QIDQ5096097FDOQ5096097


Authors: Jan Křetínský, Tobias Meggendorfer Edit this on Wikidata


Publication date: 12 August 2022

Published in: Automated Technology for Verification and Analysis (Search for Journal in Brave)

Abstract: Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.


Full work available at URL: https://arxiv.org/abs/1707.01859




Recommendations




Cited In (10)

Uses Software





This page was built for publication: Efficient strategy iteration for mean payoff in Markov decision processes

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5096097)