Average cost temporal-difference learning
From MaRDI portal
Publication:1805802
DOI10.1016/S0005-1098(99)00099-0zbMATH Open0932.93085MaRDI QIDQ1805802FDOQ1805802
Authors: John N. Tsitsiklis, Benjamin Van Roy
Publication date: 28 February 2000
Published in: Automatica (Search for Journal in Brave)
Recommendations
- On average versus discounted reward temporal-difference learning
- Learning algorithms for Markov decision processes with average cost
- Reinforcement learning based algorithms for average cost Markov decision processes
- Reinforcement learning for long-run average cost.
- Model-based average reward reinforcement learning
- Kernel-based reinforcement learning in average-cost problems
- Differential Temporal Difference Learning
- Average reward reinforcement learning: foundations, algorithms, and empirical results
- Linear least-squares algorithms for temporal difference learning
- Linear least-squares algorithms for temporal difference learning
Dynamic programming (90C39) Dynamic programming in optimal control and differential games (49L20) Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35)
Cited In (20)
- Projected equation methods for approximate solution of large linear systems
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- On average versus discounted reward temporal-difference learning
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- A time aggregation approach to Markov decision processes
- Scalable Reinforcement Learning for Multiagent Networked Systems
- Efficient multi-objective reinforcement learning via multiple-gradient descent with iteratively discovered weight-vector sets
- A stability criterion for two timescale stochastic approximation schemes
- Natural actor-critic algorithms
- Multiscale Q-learning with linear function approximation
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Hyperbolically Discounted Temporal Difference Learning
- Reinforcement learning based algorithms for average cost Markov decision processes
- Approximate policy iteration: a survey and some new methods
- Actor-critic algorithms with online feature adaptation
- Adaptive data-aware utility-based scheduling in resource-constrained systems
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- Long-Term Reward Prediction in TD Models of the Dopamine System
- Fundamental design principles for reinforcement learning algorithms
This page was built for publication: Average cost temporal-difference learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1805802)