Approximate gradient methods in policy-space optimization of Markov reward processes
From MaRDI portal
Publication:1870312
DOI10.1023/A:1022145020786zbMath1042.93061OpenAlexW1554366315MaRDI QIDQ1870312
Peter Marbach, John N. Tsitsiklis
Publication date: 11 May 2003
Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1023/a:1022145020786
Related Items (6)
Environment-driven distributed evolutionary adaptation in a population of autonomous robotic agents ⋮ Modeling and optimization of a product-service system with additional service capacity and impatient customers ⋮ Simulation-based optimization of Markov decision processes: an empirical process theory approach ⋮ Analysis and improvement of policy gradient estimation ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities
This page was built for publication: Approximate gradient methods in policy-space optimization of Markov reward processes