Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
From MaRDI portal
Publication:5219302
DOI10.1287/moor.2017.0855zbMath1434.62174arXiv1503.09105OpenAlexW2963940330MaRDI QIDQ5219302
Shalabh Bhatnagar, Prasenjit Karmakar
Publication date: 11 March 2020
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1503.09105
asymptotic convergencetemporal-difference learningMarkov noisetwo time-scale stochastic approximation
Discrete-time Markov processes on general state spaces (60J05) Stochastic approximation (62L20) Stochastic learning and adaptive control (93E35)
Related Items (5)
A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic ⋮ A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning ⋮ On Generalized Bellman Equations and Temporal-Difference Learning ⋮ Stochastic Recursive Inclusions in Two Timescales with Nonadditive Iterate-Dependent Markov Noise ⋮ Whittle index based Q-learning for restless bandits with average reward
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stochastic approximation with two time scales
- Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema
- Basis function adaptation in temporal difference reinforcement learning
- Linear stochastic approximation driven by slowly varying Markov chains
- Stochastic approximation with `controlled Markov' noise
- Stochastic approximations for finite-state Markov chains
- Applications of a Kushner and Clark lemma to general classes of stochastic algorithms
- OnActor-Critic Algorithms
- Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Stochastic Approximations and Differential Inclusions
This page was built for publication: Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning