Finding optimal memoryless policies of POMDPs under the expected average reward criterion
From MaRDI portal
(Redirected from Publication:418072)
Recommendations
- Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion
- Finite-memory strategies in POMDPs with long-run average objectives
- Average optimality for continuous-time Markov decision processes with a policy iteration approach
- On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs
- The policy iteration algorithm for average reward Markov decision processes with general state space
- Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion
- Policy iteration for bounded-parameter POMDPs
- On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes
Cites work
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- scientific article; zbMATH DE number 1753153 (Why is no real title available?)
- A survey of algorithmic methods for partially observed Markov decision processes
- Basic ideas for event-based optimization of Markov systems
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Event-Based Optimization of Markov Systems
- Optimization of a special case of continuous-time Markov decision processes with compact action set
- Performance optimization algorithms based on potentials for semi-Markov control processes
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- Simulation-based optimization of Markov reward processes
- Stochastic learning and optimization. A sensitivity-based approach.
- The $n$th-Order Bias Optimality for Multichain Markov Decision Processes
- The Optimal Control of Partially Observable Markov Processes over a Finite Horizon
Cited in
(6)- Mean-payoff optimization in continuous-time Markov chains with parametric alarms
- Finite-memory strategies in POMDPs with long-run average objectives
- Geometry of policy improvement
- Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion
- scientific article; zbMATH DE number 7625165 (Why is no real title available?)
- Future memories are not needed for large classes of POMDPs
This page was built for publication: Finding optimal memoryless policies of POMDPs under the expected average reward criterion
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q418072)