Average cost temporal-difference learning
From MaRDI portal
Publication:1805802
DOI10.1016/S0005-1098(99)00099-0zbMath0932.93085MaRDI QIDQ1805802
John N. Tsitsiklis, Benjamin van Roy
Publication date: 28 February 2000
Published in: Automatica (Search for Journal in Brave)
49L20: Dynamic programming in optimal control and differential games
90C39: Dynamic programming
93E20: Optimal stochastic control
93E35: Stochastic learning and adaptive control
Related Items
Long-Term Reward Prediction in TD Models of the Dopamine System, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Scalable Reinforcement Learning for Multiagent Networked Systems, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets, Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning, Actor-Critic Algorithms with Online Feature Adaptation, Multiscale Q-learning with linear function approximation, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Adaptive data-aware utility-based scheduling in resource-constrained systems, Projected equation methods for approximate solution of large linear systems, Natural actor-critic algorithms, A time aggregation approach to Markov decision processes, Fundamental design principles for reinforcement learning algorithms, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, A stability criterion for two timescale stochastic approximation schemes, Reinforcement learning based algorithms for average cost Markov decision processes, Approximate policy iteration: a survey and some new methods, Hyperbolically Discounted Temporal Difference Learning