An asymptotically optimal strategy for constrained multi-armed bandit problems
From MaRDI portal
Publication:784789
DOI10.1007/S00186-019-00697-3zbMATH Open1447.90022arXiv1805.01237OpenAlexW2997070617WikidataQ126414170 ScholiaQ126414170MaRDI QIDQ784789FDOQ784789
Authors: Hyeong Soo Chang
Publication date: 3 August 2020
Published in: Mathematical Methods of Operations Research (Search for Journal in Brave)
Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the -greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time goes to infinity. A particular example sequence of having the asymptotic convergence rate in the order of that holds from a sufficiently large is also discussed.
Full work available at URL: https://arxiv.org/abs/1805.01237
Recommendations
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- The multi-armed bandit, with constraints
- A general theory of multiarmed bandit processes with constrained arm switches
- scientific article; zbMATH DE number 4064878
multi-armed banditconstrained stochastic optimizationsimulation optimizationconstrained Markov decision process
Cites Work
- Prediction, Learning, and Games
- Probability Inequalities for Sums of Bounded Random Variables
- Introduction to Stochastic Search and Optimization
- Asymptotically efficient adaptive allocation rules
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Title not available (Why is that?)
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- Sample average approximation of expected value constrained stochastic programs
- The multi-armed bandit, with constraints
- Penalty function with memory for discrete optimization via simulation with stochastic constraints
- Stochastically Constrained Ranking and Selection via SCORE
- Algorithms for stochastic optimization with function or expectation constraints
- Pure exploration in finitely-armed and continuous-armed bandits
- Online learning methods for networking
- Randomised allocation of treatments in sequential trials
Cited In (17)
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- Title not available (Why is that?)
- Title not available (Why is that?)
- A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Optimal activation of halting multi‐armed bandit models
- A Structured Multiarmed Bandit Problem and the Greedy Policy
- Constrained regret minimization for multi-criterion multi-armed bandits
- Asymptotically optimal multi-armed bandit policies under a cost constraint
- An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems
- Combining multiple strategies for multiarmed bandit problems and asymptotic optimality
- A revised approach for risk-averse multi-armed bandits under CVaR criterion
- Title not available (Why is that?)
- Multi-armed bandits with censored consumption of resources
- A Note on Performance Limitations in Bandit Problems With Side Information
- On the evaluation of strategies for branching bandit processes
- Bounded Regret for Finitely Parameterized Multi-Armed Bandits
This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)