Temporal difference-based policy iteration for optimal control of stochastic systems
DOI10.1007/S10957-013-0418-1zbMATH Open1306.93074OpenAlexW2080453320MaRDI QIDQ467477FDOQ467477
Xiao-Mei Liu, Kanjian Zhang, Haikun Wei, Kang Cheng, Shumin Fei
Publication date: 3 November 2014
Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10957-013-0418-1
Recommendations
- Potential-based least-squares policy iteration for a parameterized feedback control system
- Continuous-time markov decision processes with nonzero terminal reward
- Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
- The complexity of dynamic programming
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- Least squares policy evaluation algorithms with linear function approximation
- Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
- Approximation of optimal feedback control: a dynamic programming approach
- Stable Optimal Control and Semicontractive Dynamic Programming
stochastic optimal controlapproximate dynamic programminglearning algorithms: discrete-time systemsleast squares policy evaluation algorithm
Dynamic programming (90C39) Existence of optimal solutions to problems involving randomness (49J55) Dynamic programming in optimal control and differential games (49L20) Discrete-time control/observation systems (93C55) Stochastic systems in control theory (general) (93E03) Optimal stochastic control (93E20)
Cites Work
- Title not available (Why is that?)
- Markov chains and stochastic stability
- Title not available (Why is that?)
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- On the use of the deterministic Lyapunov function for the ergodicity of stochastic difference equations
- Approximate Dynamic Programming
- Single sample path-based optimization of Markov chains
- Least squares policy evaluation algorithms with linear function approximation
- Policy iteration based feedback control
- Approximate policy iteration: a survey and some new methods
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- 10.1162/1532443041827907
- An analysis of temporal-difference learning with function approximation
- Projected equation methods for approximate solution of large linear systems
- Potential-Based Online Policy Iteration Algorithms for Markov Decision Processes
- A unified approach to Markov decision problems and performance sensitivity analysis
- Linear least-squares algorithms for temporal difference learning
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Stochastic control via direct comparison
- A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases
Cited In (4)
- A least squares temporal difference actor–critic algorithm with applications to warehouse management
- Potential-based least-squares policy iteration for a parameterized feedback control system
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
- Suboptimal control for nonlinear systems with disturbance via integral sliding mode control and policy iteration
This page was built for publication: Temporal difference-based policy iteration for optimal control of stochastic systems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q467477)