An analysis of temporal-difference learning with function approximation

Recommendations

On the convergence of temporal-difference learning with linear function approximation
Average cost temporal-difference learning
Least squares policy evaluation algorithms with linear function approximation
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Cited in

(97)

A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
From infinite to finite programs: explicit error bounds with applications to approximate dynamic programming
Projected equation methods for approximate solution of large linear systems
Neural circuits for learning context-dependent associations of stimuli
On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning
Convergence of stochastic approximation via martingale and converse Lyapunov methods
Flow shop scheduling with reinforcement learning
Rationality and intelligence
Stochastic recursive inclusions with non-additive iterate-dependent Markov noise
Uncovering instabilities in variational-quantum deep Q-networks
A Q-learning predictive control scheme with guaranteed stability
Concentration of Contractive Stochastic Approximation and Reinforcement Learning
Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
Risk-Sensitive Reinforcement Learning via Policy Gradient Search
Variance-constrained actor-critic algorithms for discounted and average reward MDPs
Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning
Relational Sequence Learning
An incremental off-policy search in a model-free Markov decision process using a single sample path
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
A reinforcement learning adaptive fuzzy controller for robots.
An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method
Adaptive importance sampling for control and inference
Convergence of entropy-regularized natural policy gradient with linear function approximation
A Small Gain Analysis of Single Timescale Actor Critic
A formal framework and extensions for function approximation in learning classifier systems
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Temporal difference-based policy iteration for optimal control of stochastic systems
Bayesian exploration for approximate dynamic programming
Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
On the sample complexity of actor-critic method for reinforcement learning with function approximation
The Borkar-Meyn theorem for asynchronous stochastic approximations
Simple and optimal methods for stochastic variational inequalities. II: Markovian noise and policy evaluation in reinforcement learning
Premium control with reinforcement learning
Quadratic approximate dynamic programming for input-affine systems
Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
Asymptotic analysis of value prediction by well-specified and misspecified models
From Reinforcement Learning to Deep Reinforcement Learning: An Overview
A review on deep reinforcement learning for fluid mechanics
Natural actor-critic algorithms
Multiscale Q-learning with linear function approximation
Variance regularization in sequential Bayesian optimization
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
Bias and variance approximation in value function estimates
Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
On-policy concurrent reinforcement learning
Q-Learning with Linear Function Approximation
Finite-time convergence rates of distributed local stochastic approximation
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
A functional model method for nonconvex nonsmooth conditional stochastic optimization
Least squares temporal difference methods: An analysis under general conditions
Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation
Perspectives of approximate dynamic programming
On the convergence of temporal-difference learning with linear function approximation
High-order fully actuated system approaches. VIII: Optimal control with application in spacecraft attitude stabilisation
Approximate dynamic programming for link scheduling in wireless mesh networks
A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning
An approximate dynamic programming approach to the admission control of elective patients
Convergence of stochastic approximation via martingale and converse Lyapunov methods
The single-node dynamic service scheduling and dispatching problem
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
Off-policy temporal difference learning with distribution adaptation in fast mixing chains
Basis function adaptation in temporal difference reinforcement learning
Hybrid MDP based integrated hierarchical Q-learning
Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
Reinforcement distribution in fuzzy Q-learning
Optimal policy evaluation using kernel-based temporal difference methods
Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Real-time reinforcement learning by sequential actor-critics and experience replay
Continuous-time robust dynamic programming
Eligibility traces and forgetting factor in recursive least-squares-based temporal difference
An actor-critic algorithm for constrained Markov decision processes
Energy contracts management by stochastic programming techniques
A tutorial on linear function approximators for dynamic programming and reinforcement learning
An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
Reinforcement learning based algorithms for average cost Markov decision processes
Stochastic approximation algorithms: overview and recent trends.
On the existence of fixed points for approximate value iteration and temporal-difference learning
A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system
scientific article; zbMATH DE number 7370555 (Why is no real title available?)
Deep reinforcement learning for inventory control: a roadmap
Toward nonlinear local reinforcement learning rules through neuroevolution
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
Approximate policy iteration: a survey and some new methods
Actor-critic algorithms with online feature adaptation
Full gradient DQN reinforcement learning: a provably convergent scheme
Finite-time error bounds for distributed linear stochastic approximation
Robust reinforcement learning control with static and dynamic stability
Stochastic approximation
Deep exploration via randomized value functions
Finite-time performance of distributed temporal-difference learning with linear function approximation
A finite time analysis of temporal difference learning with linear function approximation
Reinforcement learning algorithms with function approximation: recent advances and applications
Proximal algorithms and temporal difference methods for solving fixed point problems
Fundamental design principles for reinforcement learning algorithms
Parallel dynamic water supply scheduling in a cluster of computers

This page was built for publication: An analysis of temporal-difference learning with function approximation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4362297)