Empirical dynamic programming
DOI10.1287/MOOR.2015.0733zbMATH Open1338.49055arXiv1311.5918OpenAlexW2593952959MaRDI QIDQ2806811FDOQ2806811
Authors: William B. Haskell, Rahul Jain, Dileep Kalathil
Publication date: 19 May 2016
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1311.5918
Recommendations
simulationdynamic programmingMarkov decision processesrandom operatorsempirical methodsprobabilistic fixed points
Numerical mathematical programming methods (65K05) Empirical decision procedures; empirical Bayes procedures (62C12) Dynamic programming (90C39) Stochastic programming (90C15) Simulation of dynamical systems (37M05) Dynamic programming in optimal control and differential games (49L20) Markov and semi-Markov decision processes (90C40) Optimal stochastic control (93E20) Random linear operators (47B80) Random operators and equations (aspects of stochastic analysis) (60H25) Random dynamical systems (37H99)
Cites Work
- A Stochastic Approximation Method
- Title not available (Why is that?)
- Stochastic Games
- \({\mathcal Q}\)-learning
- Comparison methods for stochastic models and risks
- The Complexity of Markov Decision Processes
- Analysis of recursive stochastic algorithms
- Neural Network Learning
- Approximate policy iteration: a survey and some new methods
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- Stochastic Estimation of the Maximum of a Regression Function
- Functional Approximations and Dynamic Programming
- Approximations of Dynamic Programs, I
- Using Randomization to Break the Curse of Dimensionality
- Learning algorithms for Markov decision processes with average cost
- Finite-time bounds for fitted value iteration
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Associative search network: A reinforcement learning associative memory
- Approximate Fixed Point Iteration with an Application to Infinite Horizon Markov Decision Processes
- Q-Learning for Risk-Sensitive Control
- 10.1162/153244303768966102
- Title not available (Why is that?)
- Approximations of Dynamic Programs, II
- Convergence rate of linear two-time-scale stochastic approximation.
- A survey of some simulation-based algorithms for Markov decision processes
- Simulation‐based Uniform Value Function Estimates of Markov Decision Processes
- Simulation-based optimization of Markov decision processes: an empirical process theory approach
- Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models
Cited In (13)
- Some limit properties of Markov chains induced by recursive stochastic algorithms
- Title not available (Why is that?)
- Dynamic policy programming
- Robustness to incorrect models and data-driven learning in average-cost optimal stochastic control
- A concentration bound for contractive stochastic approximation
- Stochastic and adaptive optimal control of uncertain interconnected systems: a data-driven approach
- A simulation-based approach to stochastic dynamic programming
- Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence
- Empirical \(Q\)-value iteration
- Gradient-bounded dynamic programming for submodular and concave extensible value functions with probabilistic performance guarantees
- Mean-field controls with Q-learning for cooperative MARL: convergence and complexity analysis
- Distributionally robust optimization for sequential decision-making
- Anderson acceleration for partially observable Markov decision processes: a maximum entropy approach
This page was built for publication: Empirical dynamic programming
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2806811)