Temporal difference-based policy iteration for optimal control of stochastic systems
From MaRDI portal
(Redirected from Publication:467477)
stochastic optimal controlapproximate dynamic programminglearning algorithms: discrete-time systemsleast squares policy evaluation algorithm
Dynamic programming (90C39) Existence of optimal solutions to problems involving randomness (49J55) Dynamic programming in optimal control and differential games (49L20) Discrete-time control/observation systems (93C55) Stochastic systems in control theory (general) (93E03) Optimal stochastic control (93E20)
Recommendations
- Potential-based least-squares policy iteration for a parameterized feedback control system
- Continuous-time markov decision processes with nonzero terminal reward
- Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
- The complexity of dynamic programming
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- Least squares policy evaluation algorithms with linear function approximation
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Approximation of optimal feedback control: a dynamic programming approach
- Stable Optimal Control and Semicontractive Dynamic Programming
Cites work
- scientific article; zbMATH DE number 3126094 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- 10.1162/1532443041827907
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- A unified approach to Markov decision problems and performance sensitivity analysis
- A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases
- An analysis of temporal-difference learning with function approximation
- Approximate Dynamic Programming
- Approximate policy iteration: a survey and some new methods
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Least squares policy evaluation algorithms with linear function approximation
- Linear least-squares algorithms for temporal difference learning
- Markov chains and stochastic stability
- On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- Policy iteration based feedback control
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- Projected equation methods for approximate solution of large linear systems
- Single sample path-based optimization of Markov chains
- Stochastic control via direct comparison
Cited in
(10)- Stochastic linear quadratic optimal control for continuous-time systems based on policy iteration
- Suboptimal control for nonlinear systems with disturbance via integral sliding mode control and policy iteration
- A switching control strategy for policy selection in stochastic dynamic programming problems
- Stochastic control via direct comparison
- Potential-based least-squares policy iteration for a parameterized feedback control system
- On policy iteration-based discounted optimal control
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
- Policy iteration based feedback control
- A least squares temporal difference actor–critic algorithm with applications to warehouse management
This page was built for publication: Temporal difference-based policy iteration for optimal control of stochastic systems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q467477)