Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

From MaRDI portal

Publication:2884305

Jump to:navigation, search

DOI10.1287/moor.1110.0532zbMath1243.90231OpenAlexW2148864095MaRDI QIDQ2884305

Huizhen Yu, Dimitri P. Bertsekas

Publication date: 24 May 2012

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.8483

zbMATH Keywords

Markov decision processes reinforcement learning policy iteration Q-learning value iteration stochastic approximation

Mathematics Subject Classification ID

Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Distributed algorithms (68W15)

Related Items (12)

Dynamic Programming Deconstructed: Transformations of the Bellman Equation and Computational Efficiency ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ Approximate policy iteration: a survey and some new methods ⋮ A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies ⋮ Dynamic shortest path problems: hybrid routing policies considering network disruptions ⋮ On the convergence of reinforcement learning with Monte Carlo exploring starts ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ (Approximate) iterated successive approximations algorithm for sequential decision processes ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ A Q-Learning Approach for Investment Decisions ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Robust shortest path planning and semicontractive dynamic programming

This page was built for publication: Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2884305&oldid=15839799"