On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies
DOI10.1287/MOOR.1050.0148zbMATH Open1082.90131OpenAlexW2136937392MaRDI QIDQ5704236FDOQ5704236
Shie Mannor, John N. Tsitsiklis
Publication date: 11 November 2005
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://semanticscholar.org/paper/1c93280ca091393d0653ef1e21874f761d6b3653
Recommendations
- Fast convergence to state-action frequency polytopes for MDPs
- Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes
- Markov Decision Problems and State-Action Frequencies
- scientific article; zbMATH DE number 1440130
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
Convergence of probability measures (60B10) Limit theorems in probability theory (60F99) Markov and semi-Markov decision processes (90C40)
Cited In (6)
- Fast convergence to state-action frequency polytopes for MDPs
- Fluctuation Bounds for the Max-Weight Policy with Applications to State Space Collapse
- NP-hardness of checking the unichain condition in average cost MDPs
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
- Acceptable strategy profiles in stochastic games
- Stationary anonymous sequential games with undiscounted rewards
This page was built for publication: On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5704236)