Reinforcement learning based algorithms for average cost Markov decision processes
Publication:2643632
DOI10.1007/S10626-006-0003-YzbMath1146.90521OpenAlexW2061769118MaRDI QIDQ2643632
Mohammed Shahid Abdulla, Shalabh Bhatnagar
Publication date: 27 August 2007
Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10626-006-0003-y
Markov decision processesreinforcement learningpolicy iterationactor-critic algorithmssimultaneous perturbation stochastic approximationnormalized Hadamard matricesTD-learningtwo timescale stochastic approximation
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Related Items (3)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- A one-measurement form of simultaneous perturbation stochastic approximation
- Dynamic programming and stochastic control
- Actor-critic algorithms for hierarchical Markov decision processes
- Average cost temporal-difference learning
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- An analysis of temporal-difference learning with function approximation
- Asynchronous Stochastic Approximations
- OnActor-Critic Algorithms
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- The actor-critic algorithm as multi-time-scale stochastic approximation.
This page was built for publication: Reinforcement learning based algorithms for average cost Markov decision processes