Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Probability in computer science (algorithm analysis, random structures, phase transitions, etc.) (68Q87) Specification and verification (program logics, model checking, etc.) (68Q60) Markov and semi-Markov decision processes (90C40)
Abstract: Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.
Recommendations
- Optimal cost almost-sure reachability in POMDPs
- Robust almost-sure reachability in multi-environment MDPs
- Verification of Indefinite-Horizon POMDPs
- POMDPs under probabilistic semantics
- Reachability in MDPs: refining convergence of value iteration
- Reachability in Recursive Markov Decision Processes
- Reachability in recursive Markov decision processes
- Reachability in continuous-time Markov reward decision processes
- Approximability and efficient algorithms for constrained fixed-horizon POMDPs with durative actions
Cites work
- scientific article; zbMATH DE number 1509479 (Why is no real title available?)
- scientific article; zbMATH DE number 7559459 (Why is no real title available?)
- scientific article; zbMATH DE number 5585443 (Why is no real title available?)
- scientific article; zbMATH DE number 5493266 (Why is no real title available?)
- scientific article; zbMATH DE number 2238781 (Why is no real title available?)
- A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems
- A comprehensive survey on safe reinforcement learning
- Algorithms for Omega-Regular Games with Imperfect Information
- Deep reinforcement learning with temporal logics
- Minimal counterexamples for linear-time probabilistic verification
- Omega-Regular Objectives in Model-Free Reinforcement Learning
- Optimal cost almost-sure reachability in POMDPs
- Permissive controller synthesis for probabilistic systems
- Planning and acting in partially observable stochastic domains
- Probabilistic ω-automata
- Qualitative analysis of partially-observable Markov decision processes
- Shield synthesis: runtime enforcement for reactive systems
- Temporal logic motion planning using POMDPs with parity objectives
- Verification and control of partially observable probabilistic systems
Cited in
(9)- Supervisor synthesis of POMDP via automata learning
- Task-guided IRL in POMDPs that scales
- Risk-aware shielding of partially observable Monte Carlo planning policies
- Under-approximating expected total rewards in POMDPs
- Search and explore: symbiotic policy synthesis in POMDPs
- Task-aware verifiable RNN-based policies for partially observable Markov decision processes
- Optimal cost almost-sure reachability in POMDPs
- Verification of Indefinite-Horizon POMDPs
- Robust almost-sure reachability in multi-environment MDPs
This page was built for publication: Enforcing almost-sure reachability in POMDPs
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832296)