Approximate policy iteration: a survey and some new methods
From MaRDI portal
Publication:2887629
Recommendations
Cites work
- scientific article; zbMATH DE number 3846795 (Why is no real title available?)
- scientific article; zbMATH DE number 3528420 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1012640 (Why is no real title available?)
- 10.1162/1532443041827907
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- A quasi Monte Carlo method for large-scale inverse problems
- A tutorial on the cross-entropy method
- An analysis of temporal-difference learning with function approximation
- Approximate Dynamic Programming
- Approximate Dynamic Programming via a Smoothed Linear Program
- Average cost temporal-difference learning
- Basis function adaptation in temporal difference reinforcement learning
- Contraction Mappings in the Theory Underlying Dynamic Programming
- Control Techniques for Complex Networks
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- Convex optimization theory.
- Distributed dynamic programming
- Error bounds for approximations from projected linear equations
- Feature-based methods for large scale dynamic programming
- Learning Tetris Using the Noisy Cross-Entropy Method
- Least squares policy evaluation algorithms with linear function approximation
- Linear least-squares algorithms for temporal difference learning
- Monotone Mappings with Application in Dynamic Programming
- Monotone Operators and the Proximal Point Algorithm
- Monte Carlo strategies in scientific computing
- Neuro-Dynamic Programming: An Overview and Recent Results
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- Practical issues in temporal difference learning
- Projected equation methods for approximate solution of large linear systems
- Projection methods for variational inequalities with application to the traffic assignment problem
- Q-learning and enhanced policy iteration in discounted dynamic programming
- Simulation and the Monte Carlo Method
- Simulation-based algorithms for Markov decision processes.
- Simulation-based optimization: Parametric optimization techniques and reinforcement learning
- Stochastic Games
- Stochastic approximation. A dynamical systems viewpoint.
- Stochastic learning and optimization. A sensitivity-based approach.
- Stochastic optimal control. The discrete time case
- Technical update: Least-squares temporal difference learning
- Tetris: A study of randomized constraint sampling
Cited in
(55)- Improved value iteration for neural-network-based stochastic optimal control design
- Human motor learning is robust to control-dependent noise
- Convex optimization with an interpolation-based projection and its application to deep learning
- Allocating resources via price management systems: a dynamic programming-based approach
- Policy evaluation with temporal differences: a survey and comparison
- Truncated policy iteration methods
- An approximate dynamic programming algorithm for monotone value functions
- Data-driven optimal control via linear transfer operators: a convex approach
- Tracking control optimization scheme for a class of partially unknown fuzzy systems by using integral reinforcement learning architecture
- On the Taylor expansion of value functions
- Proximal algorithms and temporal difference methods for solving fixed point problems
- Rollout sampling approximate policy iteration
- Discrete-time dynamic graphical games: model-free reinforcement learning solution
- Analysis of classification-based policy iteration algorithms
- Approximate dynamic programming for missile defense interceptor fire control
- A partial history of the early development of continuous-time nonlinear stochastic systems theory
- Potential-based least-squares policy iteration for a parameterized feedback control system
- Hybrid least-squares algorithms for approximate policy evaluation
- Provably Near-Optimal Approximation Schemes for Implicit Stochastic and Sample-Based Dynamic Programs
- Undiscounted control policy generation for continuous-valued optimal control by approximate dynamic programming
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
- Robust adaptive dynamic programming for linear and nonlinear systems: an overview
- H∞ optimal control of unknown linear systems by adaptive dynamic programming with applications to time‐delay systems
- scientific article; zbMATH DE number 7370614 (Why is no real title available?)
- Parametric Approximation Policy Iteration Algorithm Based on Gaussian Process
- Multiply accelerated value iteration for nonsymmetric affine fixed point problems and application to Markov decision processes
- A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system
- Performance Loss Bounds for Approximate Value Iteration with State Aggregation
- Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs
- Empirical dynamic programming
- On the convergence of simulation-based iterative methods for solving singular linear systems
- A perturbation approach to approximate value iteration for average cost Markov decision processes with Borel spaces and bounded costs
- A machine learning approach to adaptive robust utility maximization and hedging
- Dynamic policy programming
- Temporal difference-based policy iteration for optimal control of stochastic systems
- Approximate dynamic programming for the military inventory routing problem
- Q-learning and enhanced policy iteration in discounted dynamic programming
- Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
- Approximative policy iteration for exit time feedback control problems driven by stochastic differential equations using tensor train format
- A Class of Decision Processes Showing Policy-Improvement/Newton–Raphson Equivalence
- Incremental constraint projection methods for variational inequalities
- New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system
- Approximate dynamic programming for the dispatch of military medical evacuation assets
- Robust reinforcement learning for stochastic linear quadratic control with multiplicative noise
- Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems
- Least squares policy evaluation algorithms with linear function approximation
- A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces
- Generalized planning as heuristic search: a new planning search-space that leverages pointers over objects
- Perspectives of approximate dynamic programming
- Performance bounds for \(\lambda \) policy iteration and application to the game of Tetris
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Simple and optimal methods for stochastic variational inequalities. I: Operator extrapolation
- Smoothing policies and safe policy gradients
- scientific article; zbMATH DE number 6542806 (Why is no real title available?)
This page was built for publication: Approximate policy iteration: a survey and some new methods
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2887629)