No-regret learning for repeated non-cooperative games with lossy bandits
From MaRDI portal
Publication:6152576
Abstract: This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give the explicit model of the utility functions in dynamic environments, the players' action can only be learned with bandit feedback. Moreover, because of unreliable communication channels or privacy protection, the bandit feedback may be lost or dropped at random. Therefore, we study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret analysis for concave games with differentiable and Lipschitz utilities. Then we show that the action profile converges to a Nash equilibrium with probability 1 when the game is also strictly monotone. We further provide the mean square convergence rate when the game is strongly monotone. In addition, we extend the algorithm to the case when the loss probability of the bandit feedback is unknown, and prove its almost sure convergence to Nash equilibrium for strictly monotone games. Finally, we take the resource management in fog computing as an application example, and carry out numerical experiments to empirically demonstrate the algorithm performance.
Recommendations
- Generalized mirror descents in congestion games
- Learning in games with continuous action sets and unknown payoff functions
- Learning, regret minimization, and equilibria
- Near-optimal no-regret algorithms for zero-sum games
- A general class of no-regret learning algorithms and game-theoretic equilibria.
Cites work
- A Distributed Forward–Backward Algorithm for Stochastic Generalized Nash Equilibrium Seeking
- A new one-point residual-feedback oracle for black-box learning and control
- A one-measurement form of simultaneous perturbation stochastic approximation
- An Online Convex Optimization Approach to Proactive Network Resource Allocation
- An operator splitting approach for distributed generalized Nash equilibria computation
- Continuous-Time Discounted Mirror Descent Dynamics in Monotone Concave Games
- Corrigendum to: ``Stochastic generalized Nash equilibrium seeking under partial-decision information
- Cycles in adversarial regularized learning
- Decentralized online convex optimization based on signs of relative states
- Design of Cognitive Radio Systems Under Temperature-Interference Constraints: A Variational Inequality Approach
- Distributed Bandit Online Convex Optimization With Time-Varying Coupled Inequality Constraints
- Distributed Nash equilibrium seeking under partial-decision information via the alternating direction method of multipliers
- Distributed Online Linear Regressions
- Distributed stochastic optimization with large delays
- Efficient algorithms for online decision problems
- Existence and Uniqueness of Equilibrium Points for Concave N-Person Games
- Game Theory for Big Data Processing: Multileader Multifollower Game-Based ADMM
- Generalized Nash equilibrium seeking strategy for distributed nonsmooth multi-cluster game
- Learning with minimal information in continuous games
- Logarithmic regret algorithms for online convex optimization
- Nash equilibrium seeking for \(N\)-coalition noncooperative games
- Near-optimal no-regret algorithms for zero-sum games
- On a Stochastic Approximation Method
- On synchronous, asynchronous, and randomized best-response schemes for stochastic Nash games
- On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms
- On the robustness of learning in games with stochastically perturbed payoff observations
- Online Convex Optimization With Time-Varying Constraints and Bandit Feedback
- Online learning and online convex optimization
- Optimal distributed stochastic mirror descent for strongly convex optimization
- Personalized optimization with user's feedback
- Prediction, Learning, and Games
- Predictive online convex optimization
- Random gradient-free minimization of convex functions
- Secure Mobile Edge Computing in IoT via Collaborative Online Learning
- The multiplicative weights update method: a meta-algorithm and applications
This page was built for publication: No-regret learning for repeated non-cooperative games with lossy bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6152576)