Some limit properties of Markov chains induced by recursive stochastic algorithms
From MaRDI portal
Publication:5037552
Abstract: Recursive stochastic algorithms have gained significant attention in the recent past due to data driven applications. Examples include stochastic gradient descent for solving large-scale optimization problems and empirical dynamic programming algorithms for solving Markov decision problems. These recursive stochastic algorithms approximate certain contraction operators and can be viewed within the framework of iterated random operators. Accordingly, we consider iterated random operators over a Polish space that simulate iterated contraction operator over that Polish space. Assume that the iterated random operators are indexed by certain batch sizes such that as batch sizes grow to infinity, each realization of the random operator converges (in some sense) to the contraction operator it is simulating. We show that starting from the same initial condition, the distribution of the random sequence generated by the iterated random operators converges weakly to the trajectory generated by the contraction operator. We further show that under certain conditions, the time average of the random sequence converges to the spatial mean of the invariant distribution. We then apply these results to logistic regression, empirical value iteration, and empirical Q value iteration for finite state finite action MDPs to illustrate the general theory develop here.
Recommendations
- scientific article; zbMATH DE number 3980968
- scientific article; zbMATH DE number 7733450
- Stochastic recursive inclusions with non-additive iterate-dependent Markov noise
- On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates
- Stochastic Approximation and Large Deviations: Upper Bounds and <scp>w.p.1</scp> Convergence
Cites work
- scientific article; zbMATH DE number 5348356 (Why is no real title available?)
- scientific article; zbMATH DE number 1193442 (Why is no real title available?)
- scientific article; zbMATH DE number 3560401 (Why is no real title available?)
- scientific article; zbMATH DE number 1325008 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 1354815 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- scientific article; zbMATH DE number 2015372 (Why is no real title available?)
- scientific article; zbMATH DE number 6860839 (Why is no real title available?)
- scientific article; zbMATH DE number 2235418 (Why is no real title available?)
- A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs
- A finite time analysis of temporal difference learning with linear function approximation
- A primer on monotone operator methods
- Abstract dynamic programming
- Acceleration of Stochastic Approximation by Averaging
- Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
- Approximate iterative algorithms
- Approximations of Dynamic Programs, I
- Approximations of Dynamic Programs, II
- Asymptotic Behavior of a Markovian Stochastic Algorithm with Constant Step
- Asynchronous stochastic approximation and Q-learning
- Bridging the gap between constant step size stochastic gradient descent and Markov chains
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Concentration of measure inequalities in information theory, communications, and coding
- Convergence of stochastic processes
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Distributed asynchronous deterministic and stochastic gradient optimization algorithms
- Empirical \(Q\)-value iteration
- Empirical dynamic programming
- Ergodicity and central limit theorems for a class of Markov processes
- Error bounds for constant step-size \(Q\)-learning
- Finite-time bounds for fitted value iteration
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
- Infinite dimensional analysis. A hitchhiker's guide.
- Iterated Random Functions
- Large-scale machine learning with stochastic gradient descent
- Markov Chains and Stochastic Stability
- Metropolis-Type Annealing Algorithms for Global Optimization in $\mathbb{R}^d $
- Minimizing finite sums with the stochastic average gradient
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification
- Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models
- Probability Inequalities for Sums of Bounded Random Variables
- Q-learning and enhanced policy iteration in discounted dynamic programming
- Real analysis and applications
- Real analysis on intervals
- Recursive Stochastic Algorithms for Global Optimization in $\mathbb{R}^d $
- Reinforcement learning. An introduction
- Riemannian Stochastic Variance Reduced Gradient Algorithm with Retraction and Vector Transport
- Robust Stochastic Approximation Approach to Stochastic Programming
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
- Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
- Stochastic Gradient Descent on Riemannian Manifolds
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- The Strong Law of Large Numbers for a Class of Markov Chains
- The concentration of measure phenomenon
- Weak convergence and empirical processes. With applications to statistics
- Weak convergence of a sequence of Markov chains
This page was built for publication: Some limit properties of Markov chains induced by recursive stochastic algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5037552)