Asynchronous stochastic approximation and Q-learning

From MaRDI portal

Publication:1345139

Jump to:navigation, search

zbMath0820.68105MaRDI QIDQ1345139

John N. Tsitsiklis

Publication date: 26 February 1995

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

Markov decision problems stochastic approximation algorithms Q-learning algorithm

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage ⋮ Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms ⋮ An information-theoretic analysis of return maximization in reinforcement learning ⋮ Multiscale Q-learning with linear function approximation ⋮ Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design ⋮ Asynchronous stochastic approximation with differential inclusions ⋮ Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage Using Approximate Dynamic Programming ⋮ Online calibrated forecasts: memory efficiency versus universality for learning in games ⋮ Perspectives of approximate dynamic programming ⋮ Actor-critic algorithms for hierarchical Markov decision processes ⋮ Linear least-squares algorithms for temporal difference learning ⋮ Feature-based methods for large scale dynamic programming ⋮ Reinforcement learning with replacing eligibility traces ⋮ The loss from imperfect value functions in exceptation-based and minimax-based tasks ⋮ Asymptotics of Reinforcement Learning with Neural Networks ⋮ An adaptive learning model with foregone payoff information ⋮ Q-learning and policy iteration algorithms for stochastic shortest path problems ⋮ A Q-Learning Algorithm for Discrete-Time Linear-Quadratic Control with Random Parameters of Unknown Distribution: Convergence and Stabilization ⋮ Fictitious Play in Zero-Sum Stochastic Games ⋮ Neural circuits for learning context-dependent associations of stimuli ⋮ Stochastic approximation with two time scales ⋮ Error bounds for constant step-size \(Q\)-learning ⋮ Approximate stochastic annealing for online control of infinite horizon Markov decision processes ⋮ A Discrete-Time Switching System Analysis of Q-Learning ⋮ Unnamed Item ⋮ On the sample complexity of actor-critic method for reinforcement learning with function approximation ⋮ Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality ⋮ Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning ⋮ A stochastic contraction mapping theorem ⋮ Optimal liquidation through a limit order book: a neural network and simulation approach ⋮ Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Underestimation estimators to Q-learning ⋮ Independent learning in stochastic games ⋮ Iterative learning control for large scale nonlinear systems with observation noise ⋮ New algorithms of the Q-learning type ⋮ Stabilization of stochastic approximation by step size adaptation ⋮ Generalization of a result of Fabian on the asymptotic normality of stochastic approximation ⋮ Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference ⋮ $Q$-Learning in a Stochastic Stackelberg Game between an Uninformed Leader and a Naive Follower ⋮ A unified framework for stochastic optimization ⋮ Reinforcement learning for long-run average cost. ⋮ Adaptive dynamic programming and optimal control of nonlinear nonaffine systems ⋮ On Generalized Bellman Equations and Temporal-Difference Learning ⋮ Q-learning algorithms with random truncation bounds and applications to effective parallel computing ⋮ Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme ⋮ Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach ⋮ The asymptotic equipartition property in reinforcement learning and its relation to return maximization ⋮ Structural estimation of real options models ⋮ An optimal control approach to mode generation in hybrid systems ⋮ Boundedness of iterates in \(Q\)-learning ⋮ The actor-critic algorithm as multi-time-scale stochastic approximation. ⋮ Stochastic approximation algorithms: overview and recent trends. ⋮ An Approximate Dynamic Programming Algorithm for Monotone Value Functions ⋮ A parallel scheduling algorithm for reinforcement learning in large state space ⋮ Continuous-Time Robust Dynamic Programming ⋮ Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Convergence results on stochastic adaptive learning ⋮ Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation ⋮ Reinforcement learning and stochastic optimisation ⋮ An application of approximate dynamic programming in multi-period multi-product advertising budgeting ⋮ On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes ⋮ Natural actor-critic algorithms ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Empirical Q-Value Iteration ⋮ Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning ⋮ A simulation-based approach to stochastic dynamic programming ⋮ A Gentle Introduction to Reinforcement Learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1345139&oldid=13480395"