Model-free reinforcement learning for branching Markov decision processes
From MaRDI portal
Publication:832301
DOI10.1007/978-3-030-81688-9_30zbMATH Open1493.93060arXiv2106.06777OpenAlexW3184305164MaRDI QIDQ832301FDOQ832301
Ernst Moritz Hahn, Dominik Wojtczak, Ashutosh Trivedi, Fabio Somenzi, Mateo Perez, Sven Schewe
Publication date: 25 March 2022
Abstract: We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.
Full work available at URL: https://arxiv.org/abs/2106.06777
Recommendations
- Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial Bellman equations
- Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives
- Reinforcement learning in finite MDPs: PAC analysis
- Markov decision processes with arbitrary reward processes
- Bounded-parameter Markov decision processes
Branching processes (Galton-Watson, birth-and-death, etc.) (60J80) Markov and semi-Markov decision processes (90C40) Stochastic learning and adaptive control (93E35)
Cites Work
- Title not available (Why is that?)
- \({\mathcal Q}\)-learning
- Branching Processes
- Title not available (Why is that?)
- Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations
- Title not available (Why is that?)
- Title not available (Why is that?)
- A multiple time interval finite state projection algorithm for the solution to the chemical master equation
- A strongly polynomial algorithm for criticality of branching processes and consistency of stochastic context-free grammars
- Title not available (Why is that?)
- Estimation for Discrete Time Branching Processes with Application to Epidemics
- Classical Galton-Watson branching process and vaccination
- Growth Optimality for Branching Markov Decision Chains
- Greatest fixed points of probabilistic min/max polynomial equations, and reachability for branching Markov decision processes
- Recursive stochastic games with positive rewards
- Recursive Markov decision processes and recursive stochastic games
- Stabilization of branching queueing networks
- Model Checking Stochastic Branching Processes
- On Probabilistic Parallel Programs with Process Creation and Synchronisation
- Title not available (Why is that?)
- Optimization of Multitype Branching Processes
- Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min(Max) Polynomial Bellman Equations
Cited In (3)
Uses Software
This page was built for publication: Model-free reinforcement learning for branching Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832301)