Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
From MaRDI portal
Publication:2884305
DOI10.1287/moor.1110.0532zbMath1243.90231OpenAlexW2148864095MaRDI QIDQ2884305
Huizhen Yu, Dimitri P. Bertsekas
Publication date: 24 May 2012
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.8483
Markov decision processesreinforcement learningpolicy iterationQ-learningvalue iteration stochastic approximation
Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Distributed algorithms (68W15)
Related Items (12)
Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ Approximate policy iteration: a survey and some new methods ⋮ A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies ⋮ Dynamic shortest path problems: hybrid routing policies considering network disruptions ⋮ On the convergence of reinforcement learning with Monte Carlo exploring starts ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ (Approximate) iterated successive approximations algorithm for sequential decision processes ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ A Q-Learning Approach for Investment Decisions ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Robust shortest path planning and semicontractive dynamic programming
This page was built for publication: Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming