A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
From MaRDI portal
Publication:6096629
DOI10.1016/J.EJOR.2023.06.022arXiv2201.05737MaRDI QIDQ6096629FDOQ6096629
Publication date: 15 September 2023
Published in: European Journal of Operational Research (Search for Journal in Brave)
Abstract: This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are discounted to their present values. This discounted mean-variance optimization yields a reward function dependent on a discounted mean, and this dependency renders traditional dynamic programming methods inapplicable since it suppresses a crucial property -- time consistency. To deal with this unorthodox problem, we introduce a pseudo mean to transform the untreatable MDP to a standard one with a redefined reward function in standard form and derive a discounted mean-variance performance difference formula. With the pseudo mean, we propose a unified algorithm framework with a bilevel optimization structure for the discounted mean-variance optimization. The framework unifies a variety of algorithms for several variance-related problems including, but not limited to, risk-averse variance and mean-variance optimizations in discounted and average MDPs. Furthermore, the convergence analyses missing from the literature can be complemented with the proposed framework as well. Taking the value iteration as an example, we develop a discounted mean-variance value iteration algorithm and prove its convergence to a local optimum with the aid of a Bellman local-optimality equation. Finally, we conduct a numerical experiment on portfolio management to validate the proposed algorithm.
Full work available at URL: https://arxiv.org/abs/2201.05737
dynamic programmingbilevel optimizationMarkov decision processBellman local-optimality equationdiscounted mean-variance
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Continuous-time mean-variance portfolio selection: a stochastic LQ framework
- Optimal dynamic portfolio selection: multiperiod mean-variance formulation
- Advances in prospect theory: cumulative representation of uncertainty
- Markowitz's Mean-Variance Portfolio Selection with Regime Switching: A Continuous-Time Model
- A possibilistic mean-semivariance-entropy model for multi-period portfolio selection with transaction costs
- Sensitivity Analysis for Mean-Variance Portfolio Problems
- Variance-Penalized Markov Decision Processes
- Mean-Variance Tradeoffs in an Undiscounted MDP
- Mean-Variance Tradeoffs in an Undiscounted MDP: The Unichain Case
- The variance of discounted Markov decision processes
- Mean-variance optimization of discrete time discounted Markov decision processes
- A mean-variance optimization problem for discounted Markov decision processes
- Analysis and improvement of policy gradient estimation
- Mean-variance analysis of option contracts in a two-echelon supply chain
- Sample-Path Optimality and Variance-Minimization of Average Cost Markov Control Processes
- Survey on multi-period mean-variance portfolio selection model
- Optimization of Markov decision processes under the variance criterion
- Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques
- Multilevel optimization modeling for risk-averse stochastic programming
- Portfolio Optimization with Nonparametric Value at Risk: A Block Coordinate Descent Method
Cited In (2)
This page was built for publication: A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6096629)