A time aggregation approach to Markov decision processes
From MaRDI portal
Publication:1614322
DOI10.1016/S0005-1098(01)00282-5zbMath1026.93054MaRDI QIDQ1614322
Zhiyuan Ren, Shalabh Bhatnagar, Steven I. Marcus, Michael C. Fu, Cao, Xiren
Publication date: 5 September 2002
Published in: Automatica (Search for Journal in Brave)
Discrete-time control/observation systems (93C55) Estimation and detection in stochastic control theory (93E10) Optimal stochastic control (93E20) Markov and semi-Markov decision processes (90C40)
Related Items (12)
Event-based optimization approach for solving stochastic decision problems with probabilistic constraint ⋮ Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm ⋮ Revenue management for operations with urgent orders ⋮ The control of a two-level Markov decision process by time aggregation ⋮ Time aggregated Markov decision processes via standard dynamic programming ⋮ Potentials based optimization with embedded Markov chain for stochastic constrained system ⋮ A multi-cluster time aggregation approach for Markov chains ⋮ A tutorial on event-based optimization -- a new optimization framework ⋮ Basic ideas for event-based optimization of Markov systems ⋮ A Sensitivity‐Based Construction Approach to Variance Minimization of Markov Decision Processes ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities ⋮ A unified approach to time-aggregated Markov decision processes
Cites Work
- Unnamed Item
- Unnamed Item
- Single sample path-based optimization of Markov chains
- Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation
- The relations among potentials, perturbation analysis, and Markov decision processes
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Average cost temporal-difference learning
- \({\mathcal Q}\)-learning
- A unified approach to Markov decision problems and performance sensitivity analysis
- Aggregation of the policy iteration method for nearly completely decomposable Markov chains
- Performance gradient estimation for the very large finite Markov chains
- Multilayer control of large Markov chains
- Using Randomization to Break the Curse of Dimensionality
- Simulation-based optimization of Markov reward processes
This page was built for publication: A time aggregation approach to Markov decision processes