Regret bounds for Narendra-Shapiro bandit algorithms
From MaRDI portal
Publication:5086451
Abstract: Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this paper, we adress the following question: are the Narendra-Shapiro (NS) bandit algorithms competitive from a extit{regret} point of view? In our main result, we show that some competitive bounds can be obtained for such algorithms in their penalized version (introduced in cite{Lamberton_Pages}). More precisely, up to an over-penalization modification, the pseudo-regret related to the penalized two-armed bandit algorithm is uniformly bounded by (where is made explicit in the paper).
oindent We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.
Recommendations
- Bounded Regret for Finitely Parameterized Multi-Armed Bandits
- Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems
- Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
- Regret bounds for restless Markov bandits
- Regret Bounds for Restless Markov Bandits
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Near-optimal regret bounds for reinforcement learning
- Near-optimal regret bounds for Thompson sampling
- Deviations of stochastic bandit regret
- Constrained regret minimization for multi-criterion multi-armed bandits
Cites work
- scientific article; zbMATH DE number 3951715 (Why is no real title available?)
- scientific article; zbMATH DE number 976356 (Why is no real title available?)
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- 10.1162/153244303321897663
- A penalized bandit algorithm
- How Fast Is the Bandit?
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Long time behavior of Markov processes and beyond
- On the linear model with two absorbing barriers
- Some aspects of the sequential design of experiments
- Stochastic approximation methods for constrained and unconstrained systems
- The Nonstochastic Multiarmed Bandit Problem
- Total variation estimates for the TCP process
- Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria
- When can the two-armed bandit algorithm be trusted?
Cited in
(3)
This page was built for publication: Regret bounds for Narendra-Shapiro bandit algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5086451)