Enforcing almost-sure reachability in POMDPs

DOI10.1007/978-3-030-81688-9_28zbMATH Open1493.68213arXiv2007.00085OpenAlexW3185890095MaRDI QIDQ832296FDOQ832296

Authors: Sebastian Junges, Nils Jansen, Sanjit A. Seshia

Publication date: 25 March 2022

Abstract: Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Full work available at URL: https://arxiv.org/abs/2007.00085

Recommendations

Mathematics Subject Classification ID

Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Probability in computer science (algorithm analysis, random structures, phase transitions, etc.) (68Q87) Specification and verification (program logics, model checking, etc.) (68Q60) Markov and semi-Markov decision processes (90C40)

Cites Work

Cited In (9)

Uses Software

This page was built for publication: Enforcing almost-sure reachability in POMDPs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832296)