Interval Markov Decision Processes with Continuous Action-Spaces

DOI10.1145/3575870.3587117arXiv2211.01231OpenAlexW4363671540MaRDI QIDQ6202090FDOQ6202090

Authors: Giannis Delimpaltadakis, Morteza Lahijanian, M. jun. Mazo, Luca Laurenti

Publication date: 21 February 2024

Published in: Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control (Search for Journal in Brave)

Abstract: Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to

| m a t h c a l Q |

max problems, where

| m a t h c a l Q |

is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set

m a t h c a l A

is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of

m a t h c a l A

, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

Full work available at URL: https://arxiv.org/abs/2211.01231

zbMATH Keywords

value iteration control synthesis planning under uncertainty bounded-parameter Markov decision processes uncertain Markov decision processes

Mathematics Subject Classification ID

Formal languages and automata (68Q45) Specification and verification (program logics, model checking, etc.) (68Q60) Control/observation systems governed by functional relations other than differential equations (such as hybrid and switching systems) (93C30)

Cites Work

This page was built for publication: Interval Markov Decision Processes with Continuous Action-Spaces

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6202090)