Temporal difference-based policy iteration for optimal control of stochastic systems (Q467477)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Temporal difference-based policy iteration for optimal control of stochastic systems |
scientific article |
Statements
Temporal difference-based policy iteration for optimal control of stochastic systems (English)
0 references
3 November 2014
0 references
The authors consider an infinite-horizon stochastic optimal control problem for a discrete-time dynamic system with an additive random noise and a discounted average cost performance index. To find an optimal feedback control law the so-called approximate dynamic programming (see [\textit{W. B. Powell}, Approximate dynamic programming. Solving the curses of dimensionality. Hoboken, NJ: John Wiley \& Sons (2007; Zbl 1156.90021)]) is used in which the cost-to-go function is estimated by a temporal difference-based learning algorithm (see [\textit{R. S. Sutton}, ``Learning to predict by the methods of temporal differences'', Mach. Learn. 3, 9--44 (1988)]). The main contribution of the paper is a continuous least squares policy evaluation algorithm which enables the potential based policy iteration in a continuous state space. The algorithm is derived by solving the fixed-point equation based on the discounted Poisson equation. A continuous least squares temporal difference algorithm is also derived and a class of basis functions in the form of Euclidean distance functions to simplify the computations is proposed. The proposed methodology is illustrated by simulation examples.
0 references
stochastic optimal control
0 references
least squares policy evaluation algorithm
0 references
approximate dynamic programming
0 references
learning algorithms: discrete-time systems
0 references
0 references