A time aggregation approach to Markov decision processes
From MaRDI portal
Publication:1614322
DOI10.1016/S0005-1098(01)00282-5zbMath1026.93054MaRDI QIDQ1614322
Cao, Xiren, Shalabh Bhatnagar, Steven I. Marcus, Zhiyuan Ren, Michael C. Fu
Publication date: 5 September 2002
Published in: Automatica (Search for Journal in Brave)
93C55: Discrete-time control/observation systems
93E10: Estimation and detection in stochastic control theory
93E20: Optimal stochastic control
90C40: Markov and semi-Markov decision processes
Related Items
Potentials based optimization with embedded Markov chain for stochastic constrained system, Time aggregated Markov decision processes via standard dynamic programming, Basic ideas for event-based optimization of Markov systems, The control of a two-level Markov decision process by time aggregation
Cites Work
- Unnamed Item
- Unnamed Item
- Single sample path-based optimization of Markov chains
- Dependability for systems with a partitioned state space: Markov and semi-Markov theory and computational implementation
- The relations among potentials, perturbation analysis, and Markov decision processes
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Average cost temporal-difference learning
- \({\mathcal Q}\)-learning
- A unified approach to Markov decision problems and performance sensitivity analysis
- Aggregation of the policy iteration method for nearly completely decomposable Markov chains
- Performance gradient estimation for the very large finite Markov chains
- Multilayer control of large Markov chains
- Using Randomization to Break the Curse of Dimensionality
- Simulation-based optimization of Markov reward processes