Queueing network controls via deep reinforcement learning
From MaRDI portal
Publication:5084497
Abstract: Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning algorithms because of their ease of implementation and good practical performance. A conventional setup for notoriously difficult queueing network control problems is a Markov decision problem (MDP) that has three features: infinite state space, unbounded costs, and long-run average cost objective. We extend the theoretical framework of these APG methods for such MDP problems. The resulting PPO algorithm is tested on a parallel-server system and large-size multiclass queueing networks. The algorithm consistently generates control policies that outperform state-of-art heuristics in literature in a variety of load conditions from light to heavy traffic. These policies are demonstrated to be near-optimal when the optimal policy can be computed. A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function via sampling. First, we use a discounted relative value function as an approximation of the relative value function. Second, we propose regenerative simulation to estimate the discounted relative value function. Finally, we incorporate the approximating martingale-process method into the regenerative estimator.
Recommendations
- Value function approximation in complex queueing systems
- The complexity of optimal queuing network control
- Applying reinforcement learning to basic routing problem
- Near optimal control of queueing networks over a finite time horizon
- A reinforcement-learning approach for admission control in distributed network service systems
Cites work
- scientific article; zbMATH DE number 1804128 (Why is no real title available?)
- scientific article; zbMATH DE number 4076265 (Why is no real title available?)
- scientific article; zbMATH DE number 1188974 (Why is no real title available?)
- scientific article; zbMATH DE number 3532701 (Why is no real title available?)
- scientific article; zbMATH DE number 3543391 (Why is no real title available?)
- scientific article; zbMATH DE number 1060036 (Why is no real title available?)
- scientific article; zbMATH DE number 1936531 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- scientific article; zbMATH DE number 786521 (Why is no real title available?)
- scientific article; zbMATH DE number 932423 (Why is no real title available?)
- A fluid limit model criterion for instability of multiclass queueing networks
- A unified perturbation analysis framework for countable Markov chains
- An online actor-critic algorithm with function approximation for constrained Markov decision processes
- Applied Probability and Queues
- Approximate linear programming for networks: average cost bounds
- Approximating Martingales for Variance Reduction in Markov Process Simulation
- Asymptotic optimality of tracking policies in stochastic networks.
- Batch size effects on the efficiency of control variates in simulation
- Brownian models of multiclass queueing networks: Current status and open problems
- Brownian models of open processing networks: Canonical representation of workload.
- CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
- Communication networks. An optimization, control and stochastic networks perspective
- Convergence to equilibria for fluid models of head-of-the-line proportional processor sharing queueing networks
- Discrete-review policies for scheduling stochastic networks: trajectory tracking and fluid-scale asymptotic optimality.
- Dynamic Scheduling of a Multiclass Fluid Network
- Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: Asymptotic optimality of a threshold policy
- Fluctuation smoothing policies are stable for stochastic re-entrant lines
- Heavy Traffic Convergence of a Controlled, Multiclass Queueing System
- Heavy traffic analysis of a system with parallel servers: Asymptotic optimality of discrete-review policies
- Heavy traffic analysis of open processing networks with complete resource pooling: asymptotic optimality of discrete review policies
- Markov Chains and Stochastic Stability
- OnActor-Critic Algorithms
- Optimization of multiclass queueing networks: Polyhedral and nonlinear characterizations of achievable performance
- Performance analysis of queueing networks via robust optimization
- Performance bounds for queueing networks and scheduling policies
- Performance evaluation and policy selection in multiclass networks
- Processing Networks
- Re-entrant lines
- Reinforcement learning. An introduction
- Robust Fluid Processing Networks
- Scheduling Networks of Queues: Heavy Traffic Analysis of a Two-Station Closed Network
- Simulation-based optimization of Markov reward processes
- Stable, distributed, real-time scheduling of flexible manufacturing/assembly/diassembly systems
- State space collapse with application to heavy traffic limits for multiclass queueing networks
- Target-Pursuing Scheduling and Routing Policies for Multiclass Queueing Networks
- Technical Note—An Equivalence Between Continuous and Discrete Time Markov Decision Processes
- The Linear Programming Approach to Approximate Dynamic Programming
- Uniformization for semi-Markov decision processes under stationary policies
- Value iteration and optimization of multiclass queueing networks
- Variance reduction through smoothing and control variates for Markov chain simulations
Cited in
(8)- Online self-organizing network control with time averaged weighted throughput objective
- A reinforcement-learning approach for admission control in distributed network service systems
- Logarithmic regret bounds for continuous-time average-reward Markov decision processes
- Learning optimal admission control in partially observable queueing networks
- Advances in Neural Networks – ISNN 2005
- Applying reinforcement learning to basic routing problem
- Q-learning based heterogeneous network self-optimization for reconfigurable network with CPC assistance
- Inhomogeneous deep Q-network for time sensitive applications
This page was built for publication: Queueing network controls via deep reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5084497)