Average cost temporal-difference learning

From MaRDI portal

Revision as of 10:15, 1 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1805802

Jump to:navigation, search

DOI10.1016/S0005-1098(99)00099-0zbMath0932.93085MaRDI QIDQ1805802

John N. Tsitsiklis, Benjamin van Roy

Publication date: 28 February 2000

Published in: Automatica (Search for Journal in Brave)

zbMATH Keywords

convergence; dynamic programming; learning; mixing time; average cost; aperiodic Markov chain

Mathematics Subject Classification ID

49L20: Dynamic programming in optimal control and differential games

90C39: Dynamic programming

93E20: Optimal stochastic control

93E35: Stochastic learning and adaptive control

Related Items

Long-Term Reward Prediction in TD Models of the Dopamine System, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Scalable Reinforcement Learning for Multiagent Networked Systems, Risk-Sensitive Reinforcement Learning via Policy Gradient Search, Efficient Multi-objective Reinforcement Learning via Multiple-gradient Descent with Iteratively Discovered Weight-Vector Sets, Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning, Actor-Critic Algorithms with Online Feature Adaptation, Multiscale Q-learning with linear function approximation, An online actor-critic algorithm with function approximation for constrained Markov decision processes, Adaptive data-aware utility-based scheduling in resource-constrained systems, Projected equation methods for approximate solution of large linear systems, Natural actor-critic algorithms, A time aggregation approach to Markov decision processes, Fundamental design principles for reinforcement learning algorithms, Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning, A stability criterion for two timescale stochastic approximation schemes, Reinforcement learning based algorithms for average cost Markov decision processes, Approximate policy iteration: a survey and some new methods, Hyperbolically Discounted Temporal Difference Learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1805802&oldid=14164888"