Temporal difference-based policy iteration for optimal control of stochastic systems (Q467477)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Temporal difference-based policy iteration for optimal control of stochastic systems	scientific article

Statements

scholarly article

0 references

Temporal difference-based policy iteration for optimal control of stochastic systems (English)

0 references

0 references

0 references

0 references

0 references

0 references

Journal of Optimization Theory and Applications

0 references

publication date

3 November 2014

0 references

The authors consider an infinite-horizon stochastic optimal control problem for a discrete-time dynamic system with an additive random noise and a discounted average cost performance index. To find an optimal feedback control law the so-called approximate dynamic programming (see [\textit{W. B. Powell}, Approximate dynamic programming. Solving the curses of dimensionality. Hoboken, NJ: John Wiley \& Sons (2007; Zbl 1156.90021)]) is used in which the cost-to-go function is estimated by a temporal difference-based learning algorithm (see [\textit{R. S. Sutton}, ``Learning to predict by the methods of temporal differences'', Mach. Learn. 3, 9--44 (1988)]). The main contribution of the paper is a continuous least squares policy evaluation algorithm which enables the potential based policy iteration in a continuous state space. The algorithm is derived by solving the fixed-point equation based on the discounted Poisson equation. A continuous least squares temporal difference algorithm is also derived and a class of basis functions in the form of Euclidean distance functions to simplify the computations is proposed. The proposed methodology is illustrated by simulation examples.

0 references

zbMATH Keywords

stochastic optimal control

0 references

least squares policy evaluation algorithm

0 references

approximate dynamic programming

0 references

learning algorithms: discrete-time systems

0 references

Andrzej Swierniak

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s10957-013-0418-1

0 references

0 references

Approximate Dynamic Programming

0 references

0 references

Approximate policy iteration: a survey and some new methods

0 references

A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications

0 references

Perturbation realization, potentials, and sensitivity analysis of Markov processes

0 references

A unified approach to Markov decision problems and performance sensitivity analysis

0 references

A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

0 references

Policy iteration based feedback control

0 references

Single sample path-based optimization of Markov chains

0 references

Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes

0 references

Linear least-squares algorithms for temporal difference learning

0 references

Least squares policy evaluation algorithms with linear function approximation

0 references

10.1162/1532443041827907

0 references

Projected equation methods for approximate solution of large linear systems

0 references

An analysis of temporal-difference learning with function approximation

0 references

On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations

0 references

Markov chains and stochastic stability

0 references

Convergence Results for Some Temporal Difference Methods Based on Least Squares

0 references

Stochastic control via direct comparison

0 references

Identifiers

zbMATH Open document ID

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

10.1007/S10957-013-0418-1

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:467477

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q467477&oldid=38360690"