Queueing network controls via deep reinforcement learning
From MaRDI portal
Publication:5084497
DOI10.1287/STSY.2021.0081zbMATH Open1489.60145arXiv2008.01644OpenAlexW3047127288MaRDI QIDQ5084497FDOQ5084497
Author name not available (Why is that?)
Publication date: 24 June 2022
Published in: Stochastic Systems (Search for Journal in Brave)
Abstract: Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning algorithms because of their ease of implementation and good practical performance. A conventional setup for notoriously difficult queueing network control problems is a Markov decision problem (MDP) that has three features: infinite state space, unbounded costs, and long-run average cost objective. We extend the theoretical framework of these APG methods for such MDP problems. The resulting PPO algorithm is tested on a parallel-server system and large-size multiclass queueing networks. The algorithm consistently generates control policies that outperform state-of-art heuristics in literature in a variety of load conditions from light to heavy traffic. These policies are demonstrated to be near-optimal when the optimal policy can be computed. A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function via sampling. First, we use a discounted relative value function as an approximation of the relative value function. Second, we propose regenerative simulation to estimate the discounted relative value function. Finally, we incorporate the approximating martingale-process method into the regenerative estimator.
Full work available at URL: https://arxiv.org/abs/2008.01644
Recommendations
- Value function approximation in complex queueing systems
- The complexity of optimal queuing network control
- Applying reinforcement learning to basic routing problem
- Near optimal control of queueing networks over a finite time horizon
- A reinforcement-learning approach for admission control in distributed network service systems
Queues and service in operations research (90B22) Queueing theory (aspects of probability theory) (60K25) Stochastic network models in operations research (90B15)
Cites Work
- Markov Chains and Stochastic Stability
- Batch size effects on the efficiency of control variates in simulation
- Approximating Martingales for Variance Reduction in Markov Process Simulation
- Applied Probability and Queues
- Brownian models of multiclass queueing networks: Current status and open problems
- Technical Note—An Equivalence Between Continuous and Discrete Time Markov Decision Processes
- The Linear Programming Approach to Approximate Dynamic Programming
- OnActor-Critic Algorithms
- Title not available (Why is that?)
- Heavy traffic analysis of a system with parallel servers: Asymptotic optimality of discrete-review policies
- Simulation-based optimization of Markov reward processes
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Title not available (Why is that?)
- Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance
- Value iteration and optimization of multiclass queueing networks
- Approximate linear programming for networks: average cost bounds
- Performance bounds for queueing networks and scheduling policies
- Target-Pursuing Scheduling and Routing Policies for Multiclass Queueing Networks
- State space collapse with application to heavy traffic limits for multiclass queueing networks
- Heavy traffic analysis of open processing networks with complete resource pooling: asymptotic optimality of discrete review policies
- Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: Asymptotic optimality of a threshold policy
- Title not available (Why is that?)
- Title not available (Why is that?)
- Scheduling Networks of Queues: Heavy Traffic Analysis of a Two-Station Closed Network
- Convergence to equilibria for fluid models of head-of-the-line proportional processor sharing queueing networks
- Brownian models of open processing networks: Canonical representation of workload.
- A fluid limit model criterion for instability of multiclass queueing networks
- Title not available (Why is that?)
- Title not available (Why is that?)
- Re-entrant lines
- Asymptotic optimality of tracking policies in stochastic networks.
- Uniformization for semi-Markov decision processes under stationary policies
- Title not available (Why is that?)
- Stable, distributed, real-time scheduling of flexible manufacturing/assembly/diassembly systems
- Performance evaluation and policy selection in multiclass networks
- Variance reduction through smoothing and control variates for Markov chain simulations
- Title not available (Why is that?)
- Discrete-review policies for scheduling stochastic networks: trajectory tracking and fluid-scale asymptotic optimality.
- Title not available (Why is that?)
- Performance Analysis of Queueing Networks via Robust Optimization
- Dynamic Scheduling of a Multiclass Fluid Network
- Title not available (Why is that?)
- Heavy Traffic Convergence of a Controlled, Multiclass Queueing System
- Title not available (Why is that?)
- Fluctuation smoothing policies are stable for stochastic re-entrant lines
- Robust Fluid Processing Networks
- Title not available (Why is that?)
- Processing Networks
- A unified perturbation analysis framework for countable Markov chains
Cited In (4)
Uses Software
This page was built for publication: Queueing network controls via deep reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5084497)