Simulation-based algorithms for Markov decision processes (Q1946768): Difference between revisions

The monograph is devoted to Markov Decision Processes (MDP) models that are widely used for modeling sequential decision-making problems. Those problems arise in engineering, economics, computer science and the social sciences. The monograph is the second extended edition of the book first published over six years ago. The book presents the latests developments in the theories and the relevant algorithms developed by the authors in the MDP field. The book consists of five chapters. In Chapter 1 a formal description of the discounted reward MDP framework including both the finite- and infinite-horizon settings and summarizing the associated optimality equations is presented. Chapter 2 presents simulation-based algorithms estimating the optimal value function in finite-horizon MDPs with large (possibly uncountable) state spaces, where the usual techniques of policy iteration and value iteration are either computationally impractical or infeasible to implement. Chapter 3 is devoted to infinite-horizon problems and evolutionary approaches for finding an optimal policy. In Chapter 4 a global optimization approach called Model Reference Adaptive Search (MRAS), which provides a broad framework for updating a probability distribution over the solution space in a way that ensures convergence to an optimal solution, is presented. In Chapter 5 the authors consider an approximate rolling-horizon MDPs with large state{/action} spaces in an online manner by simulation. This well-written book is addressed to researchers in MDPs and applied modeling with an interests in numerical computations, but the book is also accessible to graduate students in operation research, computer science, and economics. The authors gives many pseudocodes of algorithms, numerical examples, algorithms convergence analysis and bibliographical notes that can be very helpful for readers to understand the ideas presented in the book and to perform experiments on their own.

0 references

reviewed by

Wiesław Kotarski

0 references

zbMATH Keywords

Markov decision process

0 references

multi-stage adaptive sampling

0 references

population-based evolutionary method

0 references

model reference adaptive search

0 references

simulation

0 references

optimal policy

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/978-1-4471-5022-0

0 references

Identifiers

zbMATH Open document ID

1293.93002

0 references

DOI

10.1007/978-1-4471-5022-0

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1946768

@@ Property / full work available at URL @@
+https://doi.org/10.1007/978-1-4471-5022-0
+Normal rank
@@ Property / OpenAlex ID @@
+W4302608015
@@ Property / OpenAlex ID: W4302608015 / rank @@
+Normal rank