An asymptotically optimal strategy for constrained multi-armed bandit problems
From MaRDI portal
Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the -greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time goes to infinity. A particular example sequence of having the asymptotic convergence rate in the order of that holds from a sufficiently large is also discussed.
Recommendations
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- The multi-armed bandit, with constraints
- A general theory of multiarmed bandit processes with constrained arm switches
- scientific article; zbMATH DE number 4064878
Cites work
- scientific article; zbMATH DE number 4078557 (Why is no real title available?)
- Algorithms for stochastic optimization with function or expectation constraints
- Asymptotically efficient adaptive allocation rules
- Finite-time analysis of the multiarmed bandit problem
- Introduction to Stochastic Search and Optimization
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Online learning methods for networking
- Penalty function with memory for discrete optimization via simulation with stochastic constraints
- Prediction, Learning, and Games
- Probability Inequalities for Sums of Bounded Random Variables
- Pure exploration in finitely-armed and continuous-armed bandits
- Randomised allocation of treatments in sequential trials
- Sample average approximation of expected value constrained stochastic programs
- Some aspects of the sequential design of experiments
- Stochastically Constrained Ranking and Selection via SCORE
- The multi-armed bandit, with constraints
Cited in
(17)- Bounded Regret for Finitely Parameterized Multi-Armed Bandits
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- scientific article; zbMATH DE number 4064878 (Why is no real title available?)
- scientific article; zbMATH DE number 4059270 (Why is no real title available?)
- A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Optimal activation of halting multi‐armed bandit models
- A Structured Multiarmed Bandit Problem and the Greedy Policy
- Constrained regret minimization for multi-criterion multi-armed bandits
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems
- A revised approach for risk-averse multi-armed bandits under CVaR criterion
- scientific article; zbMATH DE number 7596797 (Why is no real title available?)
- Multi-armed bandits with censored consumption of resources
- A Note on Performance Limitations in Bandit Problems With Side Information
- On the evaluation of strategies for branching bandit processes
This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)