On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
From MaRDI portal
Publication:5169662
DOI10.1287/moor.1120.0562zbMath1291.90296OpenAlexW2161950876MaRDI QIDQ5169662
Huizhen Yu, Dimitri P. Bertsekas
Publication date: 11 July 2014
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: http://hdl.handle.net/1721.1/93744
dynamic programmingMarkov decision processesstochastic approximationreinforcement learningQ-learning
Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Distributed algorithms (68W15)
Related Items (3)
Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ Approachability in Stackelberg stochastic games with vector costs ⋮ Empirical Q-Value Iteration
This page was built for publication: On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems