Model-free reinforcement learning for branching Markov decision processes

DOI10.1007/978-3-030-81688-9_30zbMATH Open1493.93060arXiv2106.06777OpenAlexW3184305164MaRDI QIDQ832301FDOQ832301

Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak

Publication date: 25 March 2022

Abstract: We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Full work available at URL: https://arxiv.org/abs/2106.06777

Recommendations

Mathematics Subject Classification ID

Branching processes (Galton-Watson, birth-and-death, etc.) (60J80) Markov and semi-Markov decision processes (90C40) Stochastic learning and adaptive control (93E35)

Cites Work

Cited In (3)

Uses Software

This page was built for publication: Model-free reinforcement learning for branching Markov decision processes

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832301)