On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies
From MaRDI portal
Publication:5704236
Recommendations
- Fast convergence to state-action frequency polytopes for MDPs
- Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes
- Markov Decision Problems and State-Action Frequencies
- scientific article; zbMATH DE number 1440130
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
Cited in
(6)- Fast convergence to state-action frequency polytopes for MDPs
- Fluctuation Bounds for the Max-Weight Policy with Applications to State Space Collapse
- NP-hardness of checking the unichain condition in average cost MDPs
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
- Acceptable strategy profiles in stochastic games
- Stationary anonymous sequential games with undiscounted rewards
This page was built for publication: On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5704236)