Approximate gradient methods in policy-space optimization of Markov reward processes (Q1870312)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Approximate gradient methods in policy-space optimization of Markov reward processes
scientific article

    Statements

    Approximate gradient methods in policy-space optimization of Markov reward processes (English)
    0 references
    0 references
    0 references
    11 May 2003
    0 references
    This paper considers a discrete time, finite state Markov reward process depending on a set of parameters. After a brief review of stochastic gradient descend methods, the resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability one. The updates can have a high variance resulting in a slow convergence. Two approaches reduce the variance and depend on approximate gradient formulas. Bounds for the resulting bias terms are derived. The methodology is applied to Markov reward processes.
    0 references
    0 references
    Markov reward processes
    0 references
    simulation-based optimization
    0 references
    policy-space optimization
    0 references

    Identifiers