The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

From MaRDI portal

Publication:4943730

Jump to:navigation, search

DOI10.1137/S0363012997331639zbMath0990.62071MaRDI QIDQ4943730

Vivek S. Borkar, Sean P. Meyn

Publication date: 19 March 2000

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

zbMATH Keywords

stability stochastic approximation reinforcement learning asynchronous algorithms ODE method

Mathematics Subject Classification ID

Optimal stochastic control (93E20) Stochastic approximation (62L20) Stochastic stability in control theory (93E15)

Related Items

Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms, Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, A new learning algorithm for optimal stopping, An information-theoretic analysis of return maximization in reinforcement learning, Multiscale Q-learning with linear function approximation, A sojourn-based approach to semi-Markov reinforcement learning, Online calibrated forecasts: memory efficiency versus universality for learning in games, Learning to control a structured-prediction decoder for detection of HTTP-layer DDoS attackers, Reinforcement learning based algorithms for average cost Markov decision processes, An adaptive optimization scheme with satisfactory transient performance, Distributed Stochastic Approximation with Local Projections, Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control, A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization, Stochastic recursive inclusions with non-additive iterate-dependent Markov noise, A stability criterion for two timescale stochastic approximation schemes, Stochastic approximation with long range dependent and heavy tailed noise, An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes, Error bounds for constant step-size \(Q\)-learning, A Small Gain Analysis of Single Timescale Actor Critic, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Variance-constrained actor-critic algorithms for discounted and average reward MDPs, Convergence of stochastic approximation via martingale and converse Lyapunov methods, A Discrete-Time Switching System Analysis of Q-Learning, Unnamed Item, On the sample complexity of actor-critic method for reinforcement learning with function approximation, Gradient temporal-difference learning for off-policy evaluation using emphatic weightings, Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning, Multi-agent natural actor-critic reinforcement learning algorithms, A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning, Two-timescale stochastic gradient descent in continuous time with applications to joint online parameter estimation and optimal sensor placement, Asymptotic bias of stochastic gradient search, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Approachability in Stackelberg stochastic games with vector costs, On stochastic gradient and subgradient methods with adaptive steplength sequences, Stabilization of stochastic approximation by step size adaptation, Technical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian Inference, Reinforcement learning for long-run average cost., Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals, Q-learning for Markov decision processes with a satisfiability criterion, Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, Popularity signals in trial-offer markets with social influence and position bias, Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization, Simultaneous perturbation Newton algorithms for simulation optimization, Charge-based control of DiffServ-like queues, The Borkar-Meyn theorem for asynchronous stochastic approximations, Convergence and convergence rate of stochastic gradient search in the case of multiple and non-isolated extrema, Avoidance of traps in stochastic approximation, Linear stochastic approximation driven by slowly varying Markov chains, Boundedness of iterates in \(Q\)-learning, A sensitivity formula for risk-sensitive cost and the actor-critic algorithm, Cooperative dynamics and Wardrop equilibria, Empirical Dynamic Programming, On the convergence of stochastic approximations under a subgeometric ergodic Markov dynamic, Multi-armed bandits based on a variant of simulated annealing, Event-driven stochastic approximation, A stochastic primal-dual method for optimization with conditional value at risk constraints, Nonlinear Gossip, Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling, Stochastic Recursive Inclusions in Two Timescales with Nonadditive Iterate-Dependent Markov Noise, Model-Free Reinforcement Learning for Stochastic Parity Games, Revisiting the ODE method for recursive algorithms: fast convergence using quasi stochastic approximation, An ODE method to prove the geometric convergence of adaptive stochastic algorithms, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Natural actor-critic algorithms, A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation, Non-asymptotic error bounds for constant stepsize stochastic approximation for tracking mobile agents, What may lie ahead in reinforcement learning, Fundamental design principles for reinforcement learning algorithms, Stability of annealing schemes and related processes, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence, Analyzing Approximate Value Iteration Algorithms, Iterative learning control using faded measurements without system information: a gradient estimation approach

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4943730&oldid=19361680"