Approximate policy iteration: a survey and some new methods
From MaRDI portal
Publication:2887629
DOI10.1007/s11768-011-1005-3zbMath1249.90179OpenAlexW2124477018MaRDI QIDQ2887629
Publication date: 1 June 2012
Published in: Journal of Control Theory and Applications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s11768-011-1005-3
Related Items
Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming, Potential-based least-squares policy iteration for a parameterized feedback control system, A partial history of the early development of continuous-time nonlinear stochastic systems theory, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system, Approximate dynamic programming for the dispatch of military medical evacuation assets, A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces, Perspectives of approximate dynamic programming, Discrete-time dynamic graphical games: model-free reinforcement learning solution, Human motor learning is robust to control-dependent noise, Approximate dynamic programming for the military inventory routing problem, Simple and Optimal Methods for Stochastic Variational Inequalities, I: Operator Extrapolation, Improved value iteration for neural-network-based stochastic optimal control design, Robust adaptive dynamic programming for linear and nonlinear systems: an overview, H∞ optimal control of unknown linear systems by adaptive dynamic programming with applications to time‐delay systems, Smoothing policies and safe policy gradients, A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system, Unnamed Item, Data-driven optimal control via linear transfer operators: a convex approach, On the Taylor Expansion of Value Functions, Temporal difference-based policy iteration for optimal control of stochastic systems, Approximate dynamic programming for missile defense interceptor fire control, A Machine Learning Approach to Adaptive Robust Utility Maximization and Hedging, Proximal algorithms and temporal difference methods for solving fixed point problems, Tracking control optimization scheme for a class of partially unknown fuzzy systems by using integral reinforcement learning architecture, An Approximate Dynamic Programming Algorithm for Monotone Value Functions, Empirical Dynamic Programming, A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs, Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs, Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems, Convex optimization with an interpolation-based projection and its application to deep learning, Incremental constraint projection methods for variational inequalities, Multiply Accelerated Value Iteration for NonSymmetric Affine Fixed Point Problems and Application to Markov Decision Processes, Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise, Approximative Policy Iteration for Exit Time Feedback Control Problems Driven by Stochastic Differential Equations using Tensor Train Format, Allocating resources via price management systems: a dynamic programming-based approach, Unnamed Item
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Simulation-based algorithms for Markov decision processes.
- Projected equation methods for approximate solution of large linear systems
- Stochastic approximation. A dynamical systems viewpoint.
- Stochastic optimal control. The discrete time case
- Technical update: Least-squares temporal difference learning
- Average cost temporal-difference learning
- Practical issues in temporal difference learning
- Simulation-based optimization: Parametric optimization techniques and reinforcement learning
- Least squares policy evaluation algorithms with linear function approximation
- Feature-based methods for large scale dynamic programming
- A tutorial on the cross-entropy method
- Basis function adaptation in temporal difference reinforcement learning
- Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
- Error Bounds for Approximations from Projected Linear Equations
- Learning Tetris Using the Noisy Cross-Entropy Method
- Projection methods for variational inequalities with application to the traffic assignment problem
- Distributed dynamic programming
- Monotone Operators and the Proximal Point Algorithm
- Monotone Mappings with Application in Dynamic Programming
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- An analysis of temporal-difference learning with function approximation
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Approximate Dynamic Programming via a Smoothed Linear Program
- 10.1162/1532443041827907
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Approximate Dynamic Programming
- A Quasi Monte Carlo Method for Large-Scale Inverse Problems
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- Neuro-Dynamic Programming: An Overview and Recent Results
- Simulation and the Monte Carlo Method
- Control Techniques for Complex Networks
- Contraction Mappings in the Theory Underlying Dynamic Programming
- Stochastic Games
- Monte Carlo strategies in scientific computing