Approximate gradient methods in policy-space optimization of Markov reward processes (Q1870312)

From MaRDI portal
Revision as of 09:55, 30 July 2024 by Openalex240730090724 (talk | contribs) (Set OpenAlex properties.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)





scientific article
Language Label Description Also known as
English
Approximate gradient methods in policy-space optimization of Markov reward processes
scientific article

    Statements

    Approximate gradient methods in policy-space optimization of Markov reward processes (English)
    0 references
    0 references
    0 references
    11 May 2003
    0 references
    This paper considers a discrete time, finite state Markov reward process depending on a set of parameters. After a brief review of stochastic gradient descend methods, the resulting algorithms can be implemented online, and have the property that the gradient of the average reward converges to zero with probability one. The updates can have a high variance resulting in a slow convergence. Two approaches reduce the variance and depend on approximate gradient formulas. Bounds for the resulting bias terms are derived. The methodology is applied to Markov reward processes.
    0 references
    0 references
    Markov reward processes
    0 references
    simulation-based optimization
    0 references
    policy-space optimization
    0 references

    Identifiers