A constrained optimization perspective on actor-critic algorithms and application to network routing
From MaRDI portal
(Redirected from Publication:286519)
Abstract: We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.
Recommendations
- An actor-critic algorithm for constrained Markov decision processes
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Natural actor-critic algorithms
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
Cites work
- scientific article; zbMATH DE number 5348356 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- Natural actor-critic algorithms
- New algorithms of the Q-learning type
- OnActor-Critic Algorithms
- Reinforcement learning based algorithms for average cost Markov decision processes
- Stochastic approximation methods for constrained and unconstrained systems
Cited in
(6)- On linear and super-linear convergence of natural policy gradient algorithm
- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
- Optimal action criterion and algorithm improvement of real-time dynamic programming
- Scalable $\epsilon$-Optimal Decision-Making and Stochastic Routing in Large Networks via Distributed Supervision of Probabilistic Automata
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
This page was built for publication: A constrained optimization perspective on actor-critic algorithms and application to network routing
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q286519)