Approximate policy iteration: a survey and some new methods
From MaRDI portal
Publication:2887629
DOI10.1007/S11768-011-1005-3zbMATH Open1249.90179OpenAlexW2124477018MaRDI QIDQ2887629FDOQ2887629
Authors: Dimitri P. Bertsekas
Publication date: 1 June 2012
Published in: Journal of Control Theory and Applications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s11768-011-1005-3
Recommendations
Cites Work
- Title not available (Why is that?)
- Monte Carlo strategies in scientific computing
- Convex optimization theory.
- Stochastic Games
- Stochastic approximation. A dynamical systems viewpoint.
- Projection methods for variational inequalities with application to the traffic assignment problem
- Monotone Operators and the Proximal Point Algorithm
- Title not available (Why is that?)
- Simulation and the Monte Carlo Method
- Stochastic optimal control. The discrete time case
- Stochastic learning and optimization. A sensitivity-based approach.
- A tutorial on the cross-entropy method
- Title not available (Why is that?)
- Title not available (Why is that?)
- Approximate Dynamic Programming
- Least squares policy evaluation algorithms with linear function approximation
- Feature-based methods for large scale dynamic programming
- 10.1162/1532443041827907
- Neuro-Dynamic Programming: An Overview and Recent Results
- Linear least-squares algorithms for temporal difference learning
- Title not available (Why is that?)
- Learning Tetris Using the Noisy Cross-Entropy Method
- An analysis of temporal-difference learning with function approximation
- Tetris: A study of randomized constraint sampling
- Simulation-based algorithms for Markov decision processes.
- Control Techniques for Complex Networks
- Average cost temporal-difference learning
- Simulation-based optimization: Parametric optimization techniques and reinforcement learning
- Approximate Dynamic Programming via a Smoothed Linear Program
- Q-learning and enhanced policy iteration in discounted dynamic programming
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- Contraction Mappings in the Theory Underlying Dynamic Programming
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Projected equation methods for approximate solution of large linear systems
- Technical update: Least-squares temporal difference learning
- Distributed dynamic programming
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Basis function adaptation in temporal difference reinforcement learning
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Practical issues in temporal difference learning
- Error bounds for approximations from projected linear equations
- A quasi Monte Carlo method for large-scale inverse problems
- Monotone Mappings with Application in Dynamic Programming
Cited In (55)
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
- Multiply accelerated value iteration for nonsymmetric affine fixed point problems and application to Markov decision processes
- A machine learning approach to adaptive robust utility maximization and hedging
- Convex optimization with an interpolation-based projection and its application to deep learning
- Truncated policy iteration methods
- On the Taylor expansion of value functions
- A partial history of the early development of continuous-time nonlinear stochastic systems theory
- Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
- Dynamic policy programming
- Generalized planning as heuristic search: a new planning search-space that leverages pointers over objects
- Human motor learning is robust to control-dependent noise
- Discrete-time dynamic graphical games: model-free reinforcement learning solution
- Simple and optimal methods for stochastic variational inequalities. I: Operator extrapolation
- An approximate dynamic programming algorithm for monotone value functions
- Q-learning and enhanced policy iteration in discounted dynamic programming
- Title not available (Why is that?)
- Policy evaluation with temporal differences: a survey and comparison
- Potential-based least-squares policy iteration for a parameterized feedback control system
- Temporal difference-based policy iteration for optimal control of stochastic systems
- A Class of Decision Processes Showing Policy-Improvement/Newton–Raphson Equivalence
- H∞ optimal control of unknown linear systems by adaptive dynamic programming with applications to time‐delay systems
- Approximative policy iteration for exit time feedback control problems driven by stochastic differential equations using tensor train format
- Approximate dynamic programming for the military inventory routing problem
- Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
- Robust adaptive dynamic programming for linear and nonlinear systems: an overview
- Provably Near-Optimal Approximation Schemes for Implicit Stochastic and Sample-Based Dynamic Programs
- New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system
- Approximate dynamic programming for the dispatch of military medical evacuation assets
- Performance bounds for \(\lambda \) policy iteration and application to the game of Tetris
- Approximate dynamic programming for missile defense interceptor fire control
- Parametric Approximation Policy Iteration Algorithm Based on Gaussian Process
- A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
- A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces
- Perspectives of approximate dynamic programming
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- On the convergence of simulation-based iterative methods for solving singular linear systems
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
- Least squares policy evaluation algorithms with linear function approximation
- Data-driven optimal control via linear transfer operators: a convex approach
- Tracking control optimization scheme for a class of partially unknown fuzzy systems by using integral reinforcement learning architecture
- Allocating resources via price management systems: a dynamic programming-based approach
- Empirical dynamic programming
- Incremental constraint projection methods for variational inequalities
- Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems
- Smoothing policies and safe policy gradients
- Rollout sampling approximate policy iteration
- Analysis of classification-based policy iteration algorithms
- Improved value iteration for neural-network-based stochastic optimal control design
- A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Hybrid least-squares algorithms for approximate policy evaluation
- Title not available (Why is that?)
- Robust reinforcement learning for stochastic linear quadratic control with multiplicative noise
- Proximal algorithms and temporal difference methods for solving fixed point problems
Uses Software
This page was built for publication: Approximate policy iteration: a survey and some new methods
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2887629)