Trading performance for stability in Markov decision processes
From MaRDI portal
Publication:340568
DOI10.1016/J.JCSS.2016.09.009zbMATH Open1359.90147arXiv1305.4103OpenAlexW2537440665MaRDI QIDQ340568FDOQ340568
Authors: Tomáš Brázdil, Krishnendu Chatterjee, Vojtech Forejt, Antonin Kučera
Publication date: 14 November 2016
Published in: Journal of Computer and System Sciences (Search for Journal in Brave)
Abstract: We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize both the expected mean-payoff performance of the system and its stability. We argue that the basic theoretical notion of expressing the stability in terms of the variance of the mean-payoff (called global variance in our paper) is not always sufficient, since it ignores possible instabilities on respective runs. For this reason we propose alernative definitions of stability, which we call local and hybrid variance, and which express how rewards on each run deviate from the run's own mean-payoff and from the expected mean-payoff, respectively. We show that a strategy ensuring both the expected mean-payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We then look at the problem of determining whether there is a such a strategy. For the global variance, we show that the problem is in PSPACE, and that the answer can be approximated in pseudo-polynomial time. For the hybrid variance, the analogous decision problem is in NP, and a polynomial-time approximating algorithm also exists. For local variance, we show that the decision problem is in NP. Since the overall performance can be traded for stability (and vice versa), we also present algorithms for approximating the associated Pareto curve in all the three cases. Finally, we study a special case of the decision problems, where we require a given expected mean-payoff together with zero variance. Here we show that the problems can be all solved in polynomial time.
Full work available at URL: https://arxiv.org/abs/1305.4103
Recommendations
- Trading performance for stability in Markov decision processes
- On the controller synthesis for finite-state Markov decision processes
- FSTTCS 2005: Foundations of Software Technology and Theoretical Computer Science
- Variability Sensitive Markov Decision Processes
- Markov decision problems where means bound variances
Cites Work
- Some NP-complete problems in quadratic and nonlinear programming
- Symbolic algorithms for qualitative analysis of Markov decision processes with Büchi objectives
- Efficient and dynamic algorithms for alternating Büchi games and maximal end-component decomposition
- Title not available (Why is that?)
- Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification
- An \(O(n^2)\) time algorithm for alternating Büchi games
- Markov Chains
- Markov Decision Processes with Multiple Objectives
- Finite state Markovian decision processes
- Approximation algorithms for indefinite quadratic programming
- Quadratic programming with one negative eigenvalue is NP-hard
- Title not available (Why is that?)
- Quantitative multi-objective verification for probabilistic systems
- Variance-Penalized Markov Decision Processes
- Mean-Variance Tradeoffs in an Undiscounted MDP
- Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case
- Markov decision processes and regular events
- The variance of discounted Markov decision processes
- Strong laws of large numbers for weakly correlated random variables
- Quadratic programming is in NP
- Multi-Objective Model Checking of Markov Decision Processes
- Pareto curves for probabilistic model checking
- Trading performance for stability in Markov decision processes
- Markov decision processes with multiple long-run average objectives
Cited In (11)
- Title not available (Why is that?)
- CONIC TRADING IN A MARKOVIAN STEADY STATE
- Life is random, time is not: Markov decision processes with window objectives
- Timed games with bounded window parity objectives
- Trading strategies generated by Lyapunov functions
- Multi-objective optimization of long-run average and total rewards
- Multi-cost bounded tradeoff analysis in MDP
- Stochastic games with lexicographic objectives
- Trading performance for stability in Markov decision processes
- Entropic risk for turn-based stochastic games
- A Stochastic Approximation Approach for Trend-Following Trading
This page was built for publication: Trading performance for stability in Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q340568)