Reinforcement learning based algorithms for average cost Markov decision processes
From MaRDI portal
Publication:2643632
DOI10.1007/s10626-006-0003-yzbMath1146.90521OpenAlexW2061769118MaRDI QIDQ2643632
Mohammed Shahid Abdulla, Shalabh Bhatnagar
Publication date: 27 August 2007
Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10626-006-0003-y
Markov decision processesreinforcement learningpolicy iterationactor-critic algorithmssimultaneous perturbation stochastic approximationnormalized Hadamard matricesTD-learningtwo timescale stochastic approximation
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Related Items
A constrained optimization perspective on actor-critic algorithms and application to network routing, Multiscale Q-learning with linear function approximation, Natural actor-critic algorithms
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- A one-measurement form of simultaneous perturbation stochastic approximation
- Dynamic programming and stochastic control
- Actor-critic algorithms for hierarchical Markov decision processes
- Average cost temporal-difference learning
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- An analysis of temporal-difference learning with function approximation
- Asynchronous Stochastic Approximations
- OnActor-Critic Algorithms
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- The actor-critic algorithm as multi-time-scale stochastic approximation.