On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes
From MaRDI portal
Publication:5502179
DOI10.1137/141000294zbMath1327.90364arXiv1411.1459OpenAlexW1752208072MaRDI QIDQ5502179
Publication date: 18 August 2015
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1411.1459
convergencedynamic programmingMarkov decision processesvalue iterationdiscrete-time stochastic optimal controlinfinite spaces
Dynamic programming (90C39) Optimal stochastic control (93E20) Markov and semi-Markov decision processes (90C40)
Related Items (5)
Markov Decision Processes with Incomplete Information and Semiuniform Feller Transition Probabilities ⋮ Regular Policies in Abstract Dynamic Programming ⋮ Open Problem—Convergence and Asymptotic Optimality of the Relative Value Iteration in Ergodic Control ⋮ Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies ⋮ MDPs with setwise continuous transition probabilities
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stationary policies and Markov policies in Borel dynamic programming
- Stochastic optimal control. The discrete time case
- The optimal reward operator in dynamic programming
- Asynchronous stochastic approximation and Q-learning
- Value iteration and optimization of multiclass queueing networks
- The Expected Total Cost Criterion for Markov Decision Processes under Constraints
- Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
- The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach
- Stationary Policies in Dynamic Programming Models Under Compactness Assumptions
- A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
- Algorithms for Reinforcement Learning
- A simple condition for regularity in negative programming
- A simple proof of Whittle's bridging condition in dynamic programming
- An Analysis of Stochastic Shortest Path Problems
- The Optimal Reward Operator in Negative Dynamic Programming
- Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal
- On the Optimality of Structured Policies in Countable Stage Decision Processes. II: Positive and Negative Problems
- Monotone Mappings with Application in Dynamic Programming
- Universally Measurable Policies in Dynamic Programming
- Real Analysis and Probability
- Control Techniques for Complex Networks
- A Borel Set Not Containing a Graph
- On the Existence of Stationary Optimal Strategies
This page was built for publication: On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes