On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

From MaRDI portal

Publication:5169662

Jump to:navigation, search

DOI10.1287/moor.1120.0562zbMath1291.90296OpenAlexW2161950876MaRDI QIDQ5169662

Huizhen Yu, Dimitri P. Bertsekas

Publication date: 11 July 2014

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Full work available at URL: http://hdl.handle.net/1721.1/93744

zbMATH Keywords

dynamic programming Markov decision processes stochastic approximation reinforcement learning Q-learning

Mathematics Subject Classification ID

Dynamic programming (90C39) Optimal stochastic control (93E20) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Distributed algorithms (68W15)

Related Items (3)

Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ Approachability in Stackelberg stochastic games with vector costs ⋮ Empirical Q-Value Iteration

This page was built for publication: On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5169662&oldid=19736210"