An approximate dynamic programming algorithm for monotone value functions
From MaRDI portal
Abstract: Many sequential decision problems can be formulated as Markov Decision Processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamic programming algorithm (i.e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called Monotone-ADP that exploits the monotonicity of the value functions in order to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.
Recommendations
- Approximate dynamic programming by practical examples
- Two adaptively stepped monotone algorithms for solving discounted dynamic programming equations
- Monotone value iteration for discounted finite Markov decision processes
- Dynamic programming and value-function approximation in sequential decision problems: error analysis and numerical results
- Perspectives of approximate dynamic programming
Cites work
- A simple nonparametric estimator of a strictly monotone regression function
- A survey of maintenance models: The control and surveillance of deteriorating systems
- An Empirical Distribution Function for Sampling with Incomplete Information
- An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem
- An algorithm for approximating piecewise linear concave functions from sample gradients
- An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem
- Approximate policy iteration: a survey and some new methods
- Asynchronous stochastic approximation and Q-learning
- Decomposition and Partitioning Methods for Multistage Stochastic Linear Programs
- Dynamic programming models and algorithms for the mutual fund cash balance problem
- Estimating Smooth Monotone Functions
- Estimating a smooth monotone regression function
- Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem
- How Does the Value Function of a Markov Decision Process Depend on the Transition Probabilities?
- Kernel-based reinforcement learning
- Learning Algorithms for Separable Approximations of Discrete Stochastic Optimization Problems
- Maximum Likelihood Estimates of Monotone Parameters
- Monotone nonparametric regression
- Monotone optimal replacement policies for a Markovian deteriorating system in a controllable environment
- Multi-stage stochastic optimization applied to energy planning
- Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher
- Optimal commodity trading with a capacitated storage asset
- Optimal energy commitments with storage and intermittent supply
- Optimal hour-ahead bidding in the real-time electricity market with battery storage using approximate dynamic programming
- Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients
- Properties of American option prices
- Regularized decomposition of high-dimensional multistage stochastic programs with Markov uncertainty
- Structural Properties of Stochastic Dynamic Programs
- The sample average approximation method for stochastic discrete optimization
- Towards an Economic Theory of Replacement Investment
- Valuation of energy storage: an optimal switching approach
- \({\mathcal Q}\)-learning
Cited in
(18)- A numerical study of Markov decision process algorithms for multi-component replacement problems
- MIDAS: a mixed integer dynamic approximation scheme
- Risk-averse approximate dynamic programming with quantile-based risk measures
- Shape constraints in economics and operations research
- Solving Markov decision processes via state space decomposition and time aggregation
- Meso-parametric value function approximation for dynamic customer acceptances in delivery routing
- Adaptive value function approximation for continuous-state stochastic dynamic programming
- Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications
- Linear programming formulation for non-stationary, finite-horizon Markov decision process models
- A perturbation approach to a class of discounted approximate value iteration algorithms with Borel spaces
- Optimal liquidation in a level-I limit order book for large-tick stocks
- Poisoning finite-horizon Markov decision processes at design time
- An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem
- Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem
- Algorithms for solving high dimensional PDEs: from nonlinear Monte Carlo to machine learning
- Energy management for stationary electric energy storage systems: a systematic literature review
- Solving high-dimensional optimal stopping problems using deep learning
- An optimal algorithm for finding all the jumps of a monotone step-function
This page was built for publication: An approximate dynamic programming algorithm for monotone value functions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2797467)