Undiscounted reinforcement learning algorithm based on performance potentials
From MaRDI portal
Publication:5754334
zbMATH Open1123.68359MaRDI QIDQ5754334FDOQ5754334
Publication date: 22 August 2007
Recommendations
- From perturbation analysis to Markov decision processes and reinforcement learning
- Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes
- scientific article; zbMATH DE number 2159039
- Algorithms for reinforcement learning.
- Performance optimization algorithms based on potentials for semi-Markov control processes
This page was built for publication: Undiscounted reinforcement learning algorithm based on performance potentials
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5754334)