Enforcing almost-sure reachability in POMDPs

From MaRDI portal
(Redirected from Publication:832296)




Abstract: Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.





Describes a project that uses

Uses Software





This page was built for publication: Enforcing almost-sure reachability in POMDPs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832296)