Enforcing almost-sure reachability in POMDPs

From MaRDI portal
Publication:832296

DOI10.1007/978-3-030-81688-9_28zbMATH Open1493.68213arXiv2007.00085OpenAlexW3185890095MaRDI QIDQ832296FDOQ832296


Authors: Sebastian Junges, Nils Jansen, Sanjit A. Seshia Edit this on Wikidata


Publication date: 25 March 2022

Abstract: Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.


Full work available at URL: https://arxiv.org/abs/2007.00085




Recommendations



Cites Work


Cited In (6)

Uses Software





This page was built for publication: Enforcing almost-sure reachability in POMDPs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832296)