Temporal difference-based policy iteration for optimal control of stochastic systems (Q467477)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Temporal difference-based policy iteration for optimal control of stochastic systems
scientific article

    Statements

    Temporal difference-based policy iteration for optimal control of stochastic systems (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    3 November 2014
    0 references
    The authors consider an infinite-horizon stochastic optimal control problem for a discrete-time dynamic system with an additive random noise and a discounted average cost performance index. To find an optimal feedback control law the so-called approximate dynamic programming (see [\textit{W. B. Powell}, Approximate dynamic programming. Solving the curses of dimensionality. Hoboken, NJ: John Wiley \& Sons (2007; Zbl 1156.90021)]) is used in which the cost-to-go function is estimated by a temporal difference-based learning algorithm (see [\textit{R. S. Sutton}, ``Learning to predict by the methods of temporal differences'', Mach. Learn. 3, 9--44 (1988)]). The main contribution of the paper is a continuous least squares policy evaluation algorithm which enables the potential based policy iteration in a continuous state space. The algorithm is derived by solving the fixed-point equation based on the discounted Poisson equation. A continuous least squares temporal difference algorithm is also derived and a class of basis functions in the form of Euclidean distance functions to simplify the computations is proposed. The proposed methodology is illustrated by simulation examples.
    0 references
    stochastic optimal control
    0 references
    least squares policy evaluation algorithm
    0 references
    approximate dynamic programming
    0 references
    learning algorithms: discrete-time systems
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references