Approximate policy iteration: a survey and some new methods
From MaRDI portal
Publication:2887629
DOI10.1007/S11768-011-1005-3zbMath1249.90179OpenAlexW2124477018MaRDI QIDQ2887629
Publication date: 1 June 2012
Published in: Journal of Control Theory and Applications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s11768-011-1005-3
Related Items (37)
Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming ⋮ Potential-based least-squares policy iteration for a parameterized feedback control system ⋮ A partial history of the early development of continuous-time nonlinear stochastic systems theory ⋮ Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design ⋮ New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system ⋮ Approximate dynamic programming for the dispatch of military medical evacuation assets ⋮ A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces ⋮ Perspectives of approximate dynamic programming ⋮ Discrete-time dynamic graphical games: model-free reinforcement learning solution ⋮ Human motor learning is robust to control-dependent noise ⋮ Approximate dynamic programming for the military inventory routing problem ⋮ Simple and Optimal Methods for Stochastic Variational Inequalities, I: Operator Extrapolation ⋮ Improved value iteration for neural-network-based stochastic optimal control design ⋮ Robust adaptive dynamic programming for linear and nonlinear systems: an overview ⋮ H∞ optimal control of unknown linear systems by adaptive dynamic programming with applications to time‐delay systems ⋮ Smoothing policies and safe policy gradients ⋮ A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system ⋮ Unnamed Item ⋮ Data-driven optimal control via linear transfer operators: a convex approach ⋮ On the Taylor Expansion of Value Functions ⋮ Temporal difference-based policy iteration for optimal control of stochastic systems ⋮ Approximate dynamic programming for missile defense interceptor fire control ⋮ A Machine Learning Approach to Adaptive Robust Utility Maximization and Hedging ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Tracking control optimization scheme for a class of partially unknown fuzzy systems by using integral reinforcement learning architecture ⋮ An Approximate Dynamic Programming Algorithm for Monotone Value Functions ⋮ Empirical Dynamic Programming ⋮ A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs ⋮ Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs ⋮ Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems ⋮ Convex optimization with an interpolation-based projection and its application to deep learning ⋮ Incremental constraint projection methods for variational inequalities ⋮ Multiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processes ⋮ Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise ⋮ Approximative Policy Iteration for Exit Time Feedback Control Problems Driven by Stochastic Differential Equations using Tensor Train Format ⋮ Allocating resources via price management systems: a dynamic programming-based approach ⋮ Unnamed Item
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Simulation-based algorithms for Markov decision processes.
- Projected equation methods for approximate solution of large linear systems
- Stochastic approximation. A dynamical systems viewpoint.
- Stochastic optimal control. The discrete time case
- Technical update: Least-squares temporal difference learning
- Average cost temporal-difference learning
- Practical issues in temporal difference learning
- Simulation-based optimization: Parametric optimization techniques and reinforcement learning
- Least squares policy evaluation algorithms with linear function approximation
- Feature-based methods for large scale dynamic programming
- A tutorial on the cross-entropy method
- Basis function adaptation in temporal difference reinforcement learning
- Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
- Error Bounds for Approximations from Projected Linear Equations
- Learning Tetris Using the Noisy Cross-Entropy Method
- Projection methods for variational inequalities with application to the traffic assignment problem
- Distributed dynamic programming
- Monotone Operators and the Proximal Point Algorithm
- Monotone Mappings with Application in Dynamic Programming
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- An analysis of temporal-difference learning with function approximation
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Approximate Dynamic Programming via a Smoothed Linear Program
- 10.1162/1532443041827907
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Approximate Dynamic Programming
- A Quasi Monte Carlo Method for Large-Scale Inverse Problems
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- Neuro-Dynamic Programming: An Overview and Recent Results
- Simulation and the Monte Carlo Method
- Control Techniques for Complex Networks
- Contraction Mappings in the Theory Underlying Dynamic Programming
- Stochastic Games
- Monte Carlo strategies in scientific computing
This page was built for publication: Approximate policy iteration: a survey and some new methods