Empirical dynamic programming

From MaRDI portal
Publication:2806811

DOI10.1287/MOOR.2015.0733zbMATH Open1338.49055arXiv1311.5918OpenAlexW2593952959MaRDI QIDQ2806811FDOQ2806811


Authors: William B. Haskell, Rahul Jain, Dileep Kalathil Edit this on Wikidata


Publication date: 19 May 2016

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Abstract: We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iteration' (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.


Full work available at URL: https://arxiv.org/abs/1311.5918




Recommendations




Cites Work


Cited In (13)





This page was built for publication: Empirical dynamic programming

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2806811)