On bidecision processes (Q1340581)

The author studies a (so-called) Markov bidecision process resulting from the standard Markov decision process by incorporating steps of maximization as well as minimization. With the help of an extended optimality equation he constructs a pair of policies, maximizing (resp. minimizing) the total reward in some sense. The pair of policies is found by a policy iteration method.

0 references

reviewed by

Karl-Heinz Waldmann

0 references

zbMATH Keywords

Markov bidecision process

0 references

extended optimality equation

0 references

policy iteration

0 references