Learning in structured MDPs with convex cost functions: improved regret bounds for inventory management
DOI10.1287/OPRE.2022.2263zbMATH Open1494.90004arXiv1905.04337OpenAlexW2944461362MaRDI QIDQ5095166FDOQ5095166
Authors: Shipra Agrawal, Randy Jia
Publication date: 5 August 2022
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1905.04337
Recommendations
- Technical note: Perishable inventory systems: convexity results for base-stock policies and learning algorithms under censored demand
- Dynamic Inventory Control with Fixed Setup Costs and Unknown Discrete Demand Distribution
- Near-optimal regret bounds for reinforcement learning
- Regret bounds for restless Markov bandits
- Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand
reinforcement learningregret boundsinventory control problemcensored demandonline convex optimization
Convex programming (90C25) Inventory, storage, reservoirs (90B05) Transportation, logistics and supply chain management (90B06)
Cites Work
- Partial monitoring -- classification, regret bounds, and algorithms
- An adaptive algorithm for finding the optimal base-stock policy in lost sales inventory systems with censored demand
- Lost-sales inventory theory: a review
- Foundations of inventory management
- A nonparametric asymptotic analysis of inventory planning with censored demand
- Near-optimal regret bounds for reinforcement learning
- A note on the convexity of performance measures of M/M/c queueing systems
- Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies
- Asymptotic optimality of order-up-to policies in lost sales inventory systems
- Old and New Methods for Lost-Sales Inventory Systems
- Optimal Server Allocation in a System of Multi-Server Stations
- Note—On the Marginal Benefit of Adding Servers to G/GI/m Queues
- Non-stationary stochastic optimization
Cited In (5)
- A least squares temporal difference actor–critic algorithm with applications to warehouse management
- UCB-type learning algorithms with Kaplan-Meier estimator for lost-sales inventory models with lead times
- Title not available (Why is that?)
- Deep reinforcement learning for inventory control: a roadmap
- Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
This page was built for publication: Learning in structured MDPs with convex cost functions: improved regret bounds for inventory management
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5095166)