Q-learning and policy iteration algorithms for stochastic shortest path problems

From MaRDI portal

Revision as of 03:07, 30 January 2024 by Import240129110155 (talk | contribs) (Created automatically from import240129110155)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:378731

Jump to:navigation, search

DOI10.1007/S10479-012-1128-ZzbMath1306.90171OpenAlexW2027855416WikidataQ115147448 ScholiaQ115147448MaRDI QIDQ378731

Huizhen Yu, Dimitri P. Bertsekas

Publication date: 12 November 2013

Published in: Annals of Operations Research (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s10479-012-1128-z

zbMATH Keywords

Markov decision processes stochastic approximation policy iteration stochastic shortest paths value iteration approximate dynamic programming Q-learning

Mathematics Subject Classification ID

Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)

Related Items (5)

A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Robust shortest path planning and semicontractive dynamic programming ⋮ Fundamental design principles for reinforcement learning algorithms

Cites Work

This page was built for publication: Q-learning and policy iteration algorithms for stochastic shortest path problems

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:378731&oldid=12250882"