No-regret learning for repeated non-cooperative games with lossy bandits
From MaRDI portal
Publication:6152576
DOI10.1016/J.AUTOMATICA.2023.111455arXiv2205.06968MaRDI QIDQ6152576FDOQ6152576
Authors: Wenting Liu, Jinlong Lei, Peng Yi, Yiguang Hong
Publication date: 13 February 2024
Published in: Automatica (Search for Journal in Brave)
Abstract: This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give the explicit model of the utility functions in dynamic environments, the players' action can only be learned with bandit feedback. Moreover, because of unreliable communication channels or privacy protection, the bandit feedback may be lost or dropped at random. Therefore, we study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret analysis for concave games with differentiable and Lipschitz utilities. Then we show that the action profile converges to a Nash equilibrium with probability 1 when the game is also strictly monotone. We further provide the mean square convergence rate when the game is strongly monotone. In addition, we extend the algorithm to the case when the loss probability of the bandit feedback is unknown, and prove its almost sure convergence to Nash equilibrium for strictly monotone games. Finally, we take the resource management in fog computing as an application example, and carry out numerical experiments to empirically demonstrate the algorithm performance.
Full work available at URL: https://arxiv.org/abs/2205.06968
Recommendations
- Generalized mirror descents in congestion games
- Learning in games with continuous action sets and unknown payoff functions
- Learning, regret minimization, and equilibria
- Near-optimal no-regret algorithms for zero-sum games
- A general class of no-regret learning algorithms and game-theoretic equilibria.
Cites Work
- Prediction, Learning, and Games
- Logarithmic regret algorithms for online convex optimization
- On the robustness of learning in games with stochastically perturbed payoff observations
- Random gradient-free minimization of convex functions
- Online learning and online convex optimization
- Existence and Uniqueness of Equilibrium Points for Concave N-Person Games
- Efficient algorithms for online decision problems
- The multiplicative weights update method: a meta-algorithm and applications
- On a Stochastic Approximation Method
- A one-measurement form of simultaneous perturbation stochastic approximation
- Corrigendum to: ``Stochastic generalized Nash equilibrium seeking under partial-decision information
- Near-optimal no-regret algorithms for zero-sum games
- Design of Cognitive Radio Systems Under Temperature-Interference Constraints: A Variational Inequality Approach
- Cycles in adversarial regularized learning
- Optimal distributed stochastic mirror descent for strongly convex optimization
- Nash equilibrium seeking for \(N\)-coalition noncooperative games
- An operator splitting approach for distributed generalized Nash equilibria computation
- Generalized Nash equilibrium seeking strategy for distributed nonsmooth multi-cluster game
- Distributed Nash equilibrium seeking under partial-decision information via the alternating direction method of multipliers
- Learning with minimal information in continuous games
- On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms
- Predictive online convex optimization
- On synchronous, asynchronous, and randomized best-response schemes for stochastic Nash games
- A new one-point residual-feedback oracle for black-box learning and control
- A Distributed Forward–Backward Algorithm for Stochastic Generalized Nash Equilibrium Seeking
- Decentralized online convex optimization based on signs of relative states
- Distributed Bandit Online Convex Optimization With Time-Varying Coupled Inequality Constraints
- Distributed Online Linear Regressions
- An Online Convex Optimization Approach to Proactive Network Resource Allocation
- Online Convex Optimization With Time-Varying Constraints and Bandit Feedback
- Personalized optimization with user's feedback
- Distributed stochastic optimization with large delays
- Game Theory for Big Data Processing: Multileader Multifollower Game-Based ADMM
- Continuous-Time Discounted Mirror Descent Dynamics in Monotone Concave Games
- Secure Mobile Edge Computing in IoT via Collaborative Online Learning
This page was built for publication: No-regret learning for repeated non-cooperative games with lossy bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6152576)