CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
From MaRDI portal
Publication:4450393
DOI10.1017/S0269964803172051zbMath1053.90129MaRDI QIDQ4450393
William L. Cooper, Mark E. Lewis, Shane G. Henderson
Publication date: 15 February 2004
Published in: Probability in the Engineering and Informational Sciences (Search for Journal in Brave)
Applications of statistics to actuarial sciences and financial mathematics (62P05) Markov and semi-Markov decision processes (90C40) Sequential estimation (62L12)
Related Items (10)
Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system ⋮ Queueing Network Controls via Deep Reinforcement Learning ⋮ Finding optimal memoryless policies of POMDPs under the expected average reward criterion ⋮ A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases ⋮ Basic ideas for event-based optimization of Markov systems ⋮ Coupling based estimation approaches for the average reward performance potential in Markov chains ⋮ Empirical Dynamic Programming ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities
This page was built for publication: CONVERGENCE OF SIMULATION-BASED POLICY ITERATION