An analysis of temporal-difference learning with function approximation
From MaRDI portal
Publication:4362297
DOI10.1109/9.580874zbMATH Open0914.93075OpenAlexW2139418546MaRDI QIDQ4362297FDOQ4362297
Authors: Benjamin Van Roy, John N. Tsitsiklis
Publication date: 6 May 1999
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1109/9.580874
Recommendations
- On the convergence of temporal-difference learning with linear function approximation
- Average cost temporal-difference learning
- Least squares policy evaluation algorithms with linear function approximation
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Stochastic learning and adaptive control (93E35)
Cited In (97)
- Projected equation methods for approximate solution of large linear systems
- Neural circuits for learning context-dependent associations of stimuli
- Flow shop scheduling with reinforcement learning
- Rationality and intelligence
- A Q-learning predictive control scheme with guaranteed stability
- Variance-constrained actor-critic algorithms for discounted and average reward MDPs
- Relational Sequence Learning
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
- A reinforcement learning adaptive fuzzy controller for robots.
- An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method
- Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
- Adaptive importance sampling for control and inference
- A formal framework and extensions for function approximation in learning classifier systems
- Temporal difference-based policy iteration for optimal control of stochastic systems
- Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm
- The Borkar-Meyn theorem for asynchronous stochastic approximations
- Quadratic approximate dynamic programming for input-affine systems
- From Reinforcement Learning to Deep Reinforcement Learning: An Overview
- Asymptotic analysis of value prediction by well-specified and misspecified models
- A review on deep reinforcement learning for fluid mechanics
- Natural actor-critic algorithms
- Multiscale Q-learning with linear function approximation
- Bias and variance approximation in value function estimates
- Q-Learning with Linear Function Approximation
- Least squares temporal difference methods: An analysis under general conditions
- Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation
- On the convergence of temporal-difference learning with linear function approximation
- High-order fully actuated system approaches. VIII: Optimal control with application in spacecraft attitude stabilisation
- Perspectives of approximate dynamic programming
- Approximate dynamic programming for link scheduling in wireless mesh networks
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
- A \(Sarsa(\lambda)\) algorithm based on double-layer fuzzy reasoning
- The single-node dynamic service scheduling and dispatching problem
- Off-policy temporal difference learning with distribution adaptation in fast mixing chains
- Basis function adaptation in temporal difference reinforcement learning
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Hybrid MDP based integrated hierarchical Q-learning
- Reinforcement distribution in fuzzy Q-learning
- Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming
- Continuous-time robust dynamic programming
- Real-time reinforcement learning by sequential actor-critics and experience replay
- A tutorial on linear function approximators for dynamic programming and reinforcement learning
- An actor-critic algorithm for constrained Markov decision processes
- Energy contracts management by stochastic programming techniques
- Stochastic approximation algorithms: overview and recent trends.
- Reinforcement learning based algorithms for average cost Markov decision processes
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Title not available (Why is that?)
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Deep reinforcement learning for inventory control: a roadmap
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- Approximate policy iteration: a survey and some new methods
- Robust reinforcement learning control with static and dynamic stability
- Stochastic approximation
- Deep exploration via randomized value functions
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- A finite time analysis of temporal difference learning with linear function approximation
- Reinforcement learning algorithms with function approximation: recent advances and applications
- Fundamental design principles for reinforcement learning algorithms
- Proximal algorithms and temporal difference methods for solving fixed point problems
- From infinite to finite programs: explicit error bounds with applications to approximate dynamic programming
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Convergence of stochastic approximation via martingale and converse Lyapunov methods
- On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning
- Stochastic recursive inclusions with non-additive iterate-dependent Markov noise
- Uncovering instabilities in variational-quantum deep Q-networks
- Concentration of Contractive Stochastic Approximation and Reinforcement Learning
- Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning
- Convergence of entropy-regularized natural policy gradient with linear function approximation
- A Small Gain Analysis of Single Timescale Actor Critic
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
- Bayesian exploration for approximate dynamic programming
- Premium control with reinforcement learning
- Simple and optimal methods for stochastic variational inequalities. II: Markovian noise and policy evaluation in reinforcement learning
- Adaptive critic design with graph Laplacian for online learning control of nonlinear systems
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
- Variance regularization in sequential Bayesian optimization
- Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
- On-policy concurrent reinforcement learning
- Finite-time convergence rates of distributed local stochastic approximation
- A functional model method for nonconvex nonsmooth conditional stochastic optimization
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- Convergence of stochastic approximation via martingale and converse Lyapunov methods
- An approximate dynamic programming approach to the admission control of elective patients
- Optimal policy evaluation using kernel-based temporal difference methods
- Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes
- Eligibility traces and forgetting factor in recursive least-squares-based temporal difference
- A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system
- Toward nonlinear local reinforcement learning rules through neuroevolution
- Actor-critic algorithms with online feature adaptation
- Finite-time error bounds for distributed linear stochastic approximation
- Full gradient DQN reinforcement learning: a provably convergent scheme
- Parallel dynamic water supply scheduling in a cluster of computers
This page was built for publication: An analysis of temporal-difference learning with function approximation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4362297)