Approximate gradient methods in policy-space optimization of Markov reward processes (Q1870312)

This paper considers a discrete time, finite state Markov reward process depending on a set of parameters. After a brief review of stochastic gradient descend methods, the resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability one. The updates can have a high variance resulting in a slow convergence. Two approaches reduce the variance and depend on approximate gradient formulas. Bounds for the resulting bias terms are derived. The methodology is applied to Markov reward processes.

0 references

reviewed by

Klaus Ehemann

0 references

zbMATH Keywords

Markov reward processes

0 references

simulation-based optimization

0 references

policy-space optimization

0 references

MaRDI profile type