Simulation-based algorithms for Markov decision processes. (Q870662)

This book describes algorithms for computing value functions and optimal policies for discounted Markov Decision Processes (MDPs) via simulation. The basic question addressed in this monograph is how to compute optimal policies and/or value function estimates if the expected one-step rewards and transition probabilities can only be simulated rather than given in an explicit form. The monograph consists of five chapters. Chapter 1, Markov Decision Processes, introduces the policy iteration and value iteration algorithms for finite-horizon discounted MDPs and the rolling horizon approach to infinite-horizon discounted MDPs. Chapter 2, Multi-Stage Sampling Algorithms, discusses finite-horizon problems. Given the total number of allowed simulations at a given state, it provides algorithms on how to allocate these simulations in sampling the admissible actions in order to find good policies and/or value function estimates. Chapter 3, Population-based Evolutionary Approaches, is devoted to versions of the policy iteration algorithm, when the sets of available actions are large and policy iteration cannot be conducted in the explicit form. Simulation is used to generate policies for policy improvement. Most of this chapter deals with the situation when transition probabilities and one-step rewards are given explicitly. Then the results are extended to the situation when these values are simulated. Chapter 4, Model Reference Adaptive Search, introduces and describes a global optimization technique named in the title of this chapter and abbreviated as MRAS. MRAS is first introduced in this chapter for deterministic optimization problems, then for stochastic optimization problems, and then for MDPs. When used for solving MDPs, MRAS works by randomly generating admissible policies from some distribution on the policy space. Then the distribution is updated and policies are generated again. The sequence of distributions should converge to a distribution concentrated at a point. This point is an optimal solution. As the authors indicate, MRAS is closely related to the cross-entropy method. Chapter 5, On-line Control Methods via Simulation, deals with applications of the rolling horizon approach to action selection in real time. This chapter describes and investigates approximate rolling horizon algorithms based on simulation.

0 references

reviewed by

Eugene A. Feinberg

0 references

zbMATH Keywords

Markov decisin process

0 references

simulation

0 references

algorithm

0 references

discounting