Kernel-based reinforcement learning in average-cost problems
From MaRDI portal
Publication:5267044
Cited in
(10)- Efficient algorithms of pathwise dynamic programming for decision optimization in mining operations
- An algorithmic approach to optimal asset liquidation problems
- Hoeffding's inequality for non-irreducible Markov models
- Average cost temporal-difference learning
- Algorithms for optimal control of stochastic switching systems
- Mean-field controls with Q-learning for cooperative MARL: convergence and complexity analysis
- Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states
- On Hoeffding and Bernstein type inequalities for sums of random variables in non-additive measure spaces and complete convergence
- Approximated multi-agent fitted Q iteration
- Hoeffding's inequality for uniformly ergodic Markov chains
This page was built for publication: Kernel-based reinforcement learning in average-cost problems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5267044)