A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
From MaRDI portal
Publication:3465941
DOI10.1287/moor.2014.0704zbMath1329.90157arXiv1308.3814OpenAlexW2126905415MaRDI QIDQ3465941
Huizhen Yu, Dimitri P. Bertsekas
Publication date: 29 January 2016
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1308.3814
convergencemeasurabilitypolicy iterationvalue iterationdiscrete-time stochastic controlBorel spaces Markov decision processtotal cost criteria
Dynamic programming (90C39) Optimal stochastic control (93E20) Markov and semi-Markov decision processes (90C40)
Related Items
Regular Policies in Abstract Dynamic Programming ⋮ Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies ⋮ On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs ⋮ On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes ⋮ Robust shortest path planning and semicontractive dynamic programming
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Q-learning and policy iteration algorithms for stochastic shortest path problems
- The optimal reward operator in dynamic programming
- The optimal reward operator in special classes of dynamic programming problems
- Borel-programmable functions
- Value iteration and optimization of multiclass queueing networks
- Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
- Average Cost Markov Decision Processes with Weakly Continuous Transition Probabilities
- A simple condition for regularity in negative programming
- Stability and characterisation conditions in negative programming
- A simple proof of Whittle's bridging condition in dynamic programming
- The Optimal Reward Operator in Negative Dynamic Programming
- Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal
- On the Optimality of Structured Policies in Countable Stage Decision Processes. II: Positive and Negative Problems
- Monotone Mappings with Application in Dynamic Programming
- Borel-approachable functions
- Alternative Theoretical Frameworks for Finite Horizon Discrete-Time Stochastic Optimal Control
- Universally Measurable Policies in Dynamic Programming
- Markovian Decision Processes with Compact Action Spaces
- Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey
- Memoryless Strategies in Finite-Stage Dynamic Programming
- Discounted Dynamic Programming
- Negative Dynamic Programming
- On Finding Optimal Policies in Discrete Dynamic Programming with No Discounting
- A Borel Set Not Containing a Graph
- Discrete Dynamic Programming with a Small Interest Rate
- Discrete Dynamic Programming with Sensitive Discount Optimality Criteria
- Infinite time reachability of state-space regions by using feedback control
- Non-Existence of Everywhere Proper Conditional Distributions
- Handbook of Markov decision processes. Methods and applications