Q-learning and policy iteration algorithms for stochastic shortest path problems
From MaRDI portal
Publication:378731
DOI10.1007/S10479-012-1128-ZzbMath1306.90171OpenAlexW2027855416WikidataQ115147448 ScholiaQ115147448MaRDI QIDQ378731
Huizhen Yu, Dimitri P. Bertsekas
Publication date: 12 November 2013
Published in: Annals of Operations Research (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10479-012-1128-z
Markov decision processesstochastic approximationpolicy iterationstochastic shortest pathsvalue iterationapproximate dynamic programmingQ-learning
Related Items (5)
A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Robust shortest path planning and semicontractive dynamic programming ⋮ Fundamental design principles for reinforcement learning algorithms
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- (Approximate) iterated successive approximations algorithm for sequential decision processes
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Projected equation methods for approximate solution of large linear systems
- Asynchronous stochastic approximation and Q-learning
- Finite state Markovian decision processes
- Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
- Distributed asynchronous computation of fixed points
- Distributed dynamic programming
- An Analysis of Stochastic Shortest Path Problems
- On Stationary Strategies in Borel Dynamic Programming
- Asynchronous Iterative Methods for Multiprocessors
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
- Neuro-Dynamic Programming: An Overview and Recent Results
- Discrete Dynamic Programming with Sensitive Discount Optimality Criteria
This page was built for publication: Q-learning and policy iteration algorithms for stochastic shortest path problems