CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

From MaRDI portal

Publication:4450393

Jump to:navigation, search

DOI10.1017/S0269964803172051zbMath1053.90129MaRDI QIDQ4450393

William L. Cooper, Mark E. Lewis, Shane G. Henderson

Publication date: 15 February 2004

Published in: Probability in the Engineering and Informational Sciences (Search for Journal in Brave)

Mathematics Subject Classification ID

Applications of statistics to actuarial sciences and financial mathematics (62P05) Markov and semi-Markov decision processes (90C40) Sequential estimation (62L12)

Related Items (10)

Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system ⋮ Queueing Network Controls via Deep Reinforcement Learning ⋮ Finding optimal memoryless policies of POMDPs under the expected average reward criterion ⋮ A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases ⋮ Basic ideas for event-based optimization of Markov systems ⋮ Coupling based estimation approaches for the average reward performance potential in Markov chains ⋮ Empirical Dynamic Programming ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities

This page was built for publication: CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4450393&oldid=18503325"