On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

DOI10.1162/neco.1994.6.6.1185zbMath0822.68095OpenAlexW2165131254MaRDI QIDQ4323346

Michael I. Jordan, Tommi S. Jaakkola, Satinder Pal Singh

Publication date: 18 October 1995

Published in: Neural Computation (Search for Journal in Brave)

Full work available at URL: http://hdl.handle.net/1721.1/7205

zbMATH Keywords

\(Q\)-learning algorithm

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Stochastic programming (90C15)

Related Items (43)

Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ Approximate policy iteration: a survey and some new methods ⋮ Cooperation between independent market makers ⋮ Perspectives of approximate dynamic programming ⋮ Restricted gradient-descent algorithm for value-function approximation in reinforcement learning ⋮ Linear least-squares algorithms for temporal difference learning ⋮ On the worst-case analysis of temporal-difference learning algorithms ⋮ Reinforcement learning with replacing eligibility traces ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ A Discrete-Time Switching System Analysis of Q-Learning ⋮ A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning ⋮ Unnamed Item ⋮ The optimal unbiased value estimator and its relation to LSTD, TD and MC ⋮ Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ A lexicographic optimization approach for a bi-objective parallel-machine scheduling problem minimizing total quality loss and total tardiness ⋮ Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference ⋮ A unified framework for stochastic optimization ⋮ SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING ⋮ REINFORCEMENT LEARNING WITH GOAL-DIRECTED ELIGIBILITY TRACES ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Adaptive Learning Algorithm Convergence in Passive and Reactive Environments ⋮ Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ The asymptotic equipartition property in reinforcement learning and its relation to return maximization ⋮ Adaptive stock trading with dynamic asset allocation using reinforcement learning ⋮ An optimal control approach to mode generation in hybrid systems ⋮ TD(λ) learning without eligibility traces: a theoretical analysis ⋮ Stochastic approximation algorithms: overview and recent trends. ⋮ Convergence of least squares learning in self-referential discontinuous stochastic models. ⋮ Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation ⋮ Reinforcement distribution in fuzzy Q-learning ⋮ Empirical Q-Value Iteration ⋮ A simulation-based approach to stochastic dynamic programming ⋮ Stochastic adaptation of importance sampler

Cites Work

A Stochastic Approximation Method

This page was built for publication: On the Convergence of Stochastic Iterative Dynamic Programming Algorithms