Efficient strategy iteration for mean payoff in Markov decision processes
From MaRDI portal
Publication:5096097
Abstract: Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related properties. Strategy iteration is one of the solution techniques applicable in this context. While in many other contexts it is the technique of choice due to advantages over e.g. value iteration, such as precision or possibility of domain-knowledge-aware initialization, it is rarely used for MDPs, since there it scales worse than value iteration. We provide several techniques that speed up strategy iteration by orders of magnitude for many MDPs, eliminating the performance disadvantage while preserving all its advantages.
Recommendations
- Value iteration for long-run average reward in Markov decision processes
- PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP
- Optimizing the expected mean payoff in energy Markov decision processes
- Markov decision processes with multiple long-run average objectives
- Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
Cited in
(10)- Finite-memory strategy synthesis for robust multidimensional mean-payoff objectives
- scientific article; zbMATH DE number 5726545 (Why is no real title available?)
- Comparison of algorithms for simple stochastic games
- Optimizing the expected mean payoff in energy Markov decision processes
- Comparison of algorithms for simple stochastic games
- Multi-objective optimization of long-run average and total rewards
- scientific article; zbMATH DE number 3936962 (Why is no real title available?)
- PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP
- Value iteration for simple stochastic games: stopping criterion and learning algorithm
- Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes
This page was built for publication: Efficient strategy iteration for mean payoff in Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5096097)