Regret bounds for Narendra-Shapiro bandit algorithms

From MaRDI portal
Publication:5086451

DOI10.1080/17442508.2018.1457675zbMATH Open1498.60303arXiv1502.04874OpenAlexW1693596837MaRDI QIDQ5086451FDOQ5086451


Authors: Sébastien Gadat, Fabien Panloup, Sofiane Saadane Edit this on Wikidata


Publication date: 5 July 2022

Published in: Stochastics (Search for Journal in Brave)

Abstract: Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this paper, we adress the following question: are the Narendra-Shapiro (NS) bandit algorithms competitive from a extit{regret} point of view? In our main result, we show that some competitive bounds can be obtained for such algorithms in their penalized version (introduced in cite{Lamberton_Pages}). More precisely, up to an over-penalization modification, the pseudo-regret related to the penalized two-armed bandit algorithm is uniformly bounded by Csqrtn (where C is made explicit in the paper). oindent We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.


Full work available at URL: https://arxiv.org/abs/1502.04874




Recommendations




Cites Work


Cited In (3)





This page was built for publication: Regret bounds for Narendra-Shapiro bandit algorithms

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5086451)