Fundamental design principles for reinforcement learning algorithms
From MaRDI portal
Publication:2094028
DOI10.1007/978-3-030-60990-0_4OpenAlexW3175771377MaRDI QIDQ2094028
Adithya Devraj, Ana Bušić, Sean P. Meyn
Publication date: 28 October 2022
Full work available at URL: https://doi.org/10.1007/978-3-030-60990-0_4
Related Items
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Oja's algorithm for graph clustering, Markov spectral decomposition, and risk sensitive control
- Q-learning and policy iteration algorithms for stochastic shortest path problems
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Stochastic approximation. A dynamical systems viewpoint.
- A Newton-Raphson version of the multivariate Robbins-Monro procedure
- New method of stochastic approximation type
- Asynchronous stochastic approximation and Q-learning
- Computable bounds for geometric convergence rates of Markov chains
- Hoeffding's inequality for uniformly ergodic Markov chains
- Average cost temporal-difference learning
- \({\mathcal Q}\)-learning
- Spectral theory and limit theorems for geometrically ergodic Markov processes
- Convergence rate of linear two-time-scale stochastic approximation.
- Computable exponential convergence rates for stochastically ordered Markov processes
- Large deviation asymptotics and control variates for simulating large functions
- Concentration inequalities for Markov chains by Marton couplings and spectral methods
- Stochastic approximations for finite-state Markov chains
- Learning Algorithms for Markov Decision Processes with Average Cost
- Tail asymptotics for busy periods
- Applications of a Kushner and Clark lemma to general classes of stochastic algorithms
- Algorithms for Reinforcement Learning
- Markov Chains and Stochastic Stability
- Acceleration of Stochastic Approximation by Averaging
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- A Tutorial on Thompson Sampling
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
- Bandit Algorithms
- A Concentration Bound for Stochastic Approximation via Alekseev’s Formula
- Control Techniques for Complex Networks
- An Extension of the Robbins-Monro Procedure
- Stochastic Estimation of the Maximum of a Regression Function
- A Stochastic Approximation Method
- Multidimensional Stochastic Approximation Methods
- On a Stochastic Approximation Method