The multi-armed bandit problem: an efficient nonparametric solution
From MaRDI portal
Abstract: Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Non-parametric arm allocation procedures like -greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under non-parametric settings. However unlike UCB these non-parametric procedures are not efficient under general parametric settings. In this paper we propose efficient non-parametric procedures.
Recommendations
Cites work
- scientific article; zbMATH DE number 1614382 (Why is no real title available?)
- scientific article; zbMATH DE number 4078557 (Why is no real title available?)
- scientific article; zbMATH DE number 3638998 (Why is no real title available?)
- A Bernoulli Two-armed Bandit
- A new approach to the design of reinforcement schemes for learning automata
- Adaptive treatment allocation and the multi-armed bandit problem
- Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains
- Asymptotically efficient adaptive allocation rules
- Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space
- Finite-time analysis of the multiarmed bandit problem
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Nonparametric bandit methods
- Optimal adaptive policies for sequential allocation problems
- Optimal learning and experimentation in bandit problems.
- Optimal stopping and dynamic allocation
- Reinforcement learning. An introduction
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- Some Remarks on the Two-Armed Bandit
Cited in
(16)- Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
- scientific article; zbMATH DE number 7380836 (Why is no real title available?)
- Nonparametric bandit methods
- A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
- A Structured Multiarmed Bandit Problem and the Greedy Policy
- Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
- Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits
- On Solving Finite State Multi-Armed Bandit Problem by Linear Programming
- Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback
- Infinite Arms Bandit: Optimality via Confidence Bounds
- A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems
- On the bias, risk, and consistency of sample means in multi-armed bandits
- The Nonstochastic Multiarmed Bandit Problem
- The Irrevocable Multiarmed Bandit Problem
- Optimal exploration-exploitation in a multi-armed bandit problem with non-stationary rewards
- A non-parametric solution to the multi-armed bandit problem with covariates
This page was built for publication: The multi-armed bandit problem: an efficient nonparametric solution
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2176624)