Approximate gradient methods in policy-space optimization of Markov reward processes (Q1870312)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Approximate gradient methods in policy-space optimization of Markov reward processes |
scientific article |
Statements
Approximate gradient methods in policy-space optimization of Markov reward processes (English)
0 references
11 May 2003
0 references
This paper considers a discrete time, finite state Markov reward process depending on a set of parameters. After a brief review of stochastic gradient descend methods, the resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability one. The updates can have a high variance resulting in a slow convergence. Two approaches reduce the variance and depend on approximate gradient formulas. Bounds for the resulting bias terms are derived. The methodology is applied to Markov reward processes.
0 references
Markov reward processes
0 references
simulation-based optimization
0 references
policy-space optimization
0 references