On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
From MaRDI portal
Publication:4323346
DOI10.1162/neco.1994.6.6.1185zbMath0822.68095OpenAlexW2165131254MaRDI QIDQ4323346
Michael I. Jordan, Tommi S. Jaakkola, Satinder Pal Singh
Publication date: 18 October 1995
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: http://hdl.handle.net/1721.1/7205
Related Items (43)
Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ Approximate policy iteration: a survey and some new methods ⋮ Cooperation between independent market makers ⋮ Perspectives of approximate dynamic programming ⋮ Restricted gradient-descent algorithm for value-function approximation in reinforcement learning ⋮ Linear least-squares algorithms for temporal difference learning ⋮ On the worst-case analysis of temporal-difference learning algorithms ⋮ Reinforcement learning with replacing eligibility traces ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ A Discrete-Time Switching System Analysis of Q-Learning ⋮ A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning ⋮ Unnamed Item ⋮ The optimal unbiased value estimator and its relation to LSTD, TD and MC ⋮ Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ A lexicographic optimization approach for a bi-objective parallel-machine scheduling problem minimizing total quality loss and total tardiness ⋮ Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference ⋮ A unified framework for stochastic optimization ⋮ SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING ⋮ REINFORCEMENT LEARNING WITH GOAL-DIRECTED ELIGIBILITY TRACES ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Adaptive Learning Algorithm Convergence in Passive and Reactive Environments ⋮ Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ The asymptotic equipartition property in reinforcement learning and its relation to return maximization ⋮ Adaptive stock trading with dynamic asset allocation using reinforcement learning ⋮ An optimal control approach to mode generation in hybrid systems ⋮ TD(λ) learning without eligibility traces: a theoretical analysis ⋮ Stochastic approximation algorithms: overview and recent trends. ⋮ Convergence of least squares learning in self-referential discontinuous stochastic models. ⋮ Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation ⋮ Reinforcement distribution in fuzzy Q-learning ⋮ Empirical Q-Value Iteration ⋮ A simulation-based approach to stochastic dynamic programming ⋮ Stochastic adaptation of importance sampler
Cites Work
This page was built for publication: On the Convergence of Stochastic Iterative Dynamic Programming Algorithms