Reinforcement learning based algorithms for average cost Markov decision processes

From MaRDI portal

Revision as of 10:43, 3 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:2643632

Jump to:navigation, search

DOI10.1007/S10626-006-0003-YzbMath1146.90521OpenAlexW2061769118MaRDI QIDQ2643632

Mohammed Shahid Abdulla, Shalabh Bhatnagar

Publication date: 27 August 2007

Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s10626-006-0003-y

zbMATH Keywords

Markov decision processes reinforcement learning policy iteration actor-critic algorithms simultaneous perturbation stochastic approximation normalized Hadamard matrices TD-learning two timescale stochastic approximation

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)

Related Items (3)

A constrained optimization perspective on actor-critic algorithms and application to network routing ⋮ Multiscale Q-learning with linear function approximation ⋮ Natural actor-critic algorithms

Cites Work

This page was built for publication: Reinforcement learning based algorithms for average cost Markov decision processes

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2643632&oldid=15464766"