Average cost temporal-difference learning
From MaRDI portal
Recommendations
- On average versus discounted reward temporal-difference learning
- Learning algorithms for Markov decision processes with average cost
- Reinforcement learning based algorithms for average cost Markov decision processes
- Reinforcement learning for long-run average cost.
- Model-based average reward reinforcement learning
- Kernel-based reinforcement learning in average-cost problems
- Differential Temporal Difference Learning
- Average reward reinforcement learning: foundations, algorithms, and empirical results
- Linear least-squares algorithms for temporal difference learning
- Linear least-squares algorithms for temporal difference learning
Cited in
(20)- Projected equation methods for approximate solution of large linear systems
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- On average versus discounted reward temporal-difference learning
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- A time aggregation approach to Markov decision processes
- Scalable Reinforcement Learning for Multiagent Networked Systems
- Natural actor-critic algorithms
- Multiscale Q-learning with linear function approximation
- A stability criterion for two timescale stochastic approximation schemes
- Efficient multi-objective reinforcement learning via multiple-gradient descent with iteratively discovered weight-vector sets
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Hyperbolically Discounted Temporal Difference Learning
- Reinforcement learning based algorithms for average cost Markov decision processes
- Approximate policy iteration: a survey and some new methods
- Actor-critic algorithms with online feature adaptation
- Adaptive data-aware utility-based scheduling in resource-constrained systems
- Finite-time performance of distributed temporal-difference learning with linear function approximation
- Long-Term Reward Prediction in TD Models of the Dopamine System
- Fundamental design principles for reinforcement learning algorithms
This page was built for publication: Average cost temporal-difference learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1805802)