Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal
From MaRDI portal
Publication:4743532
DOI10.2307/1426995zbMATH Open0506.60067OpenAlexW2326969892MaRDI QIDQ4743532FDOQ4743532
Authors: Kevin D. Glazebrook
Publication date: 1983
Published in: Advances in Applied Probability (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2307/1426995
Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Decision theory for games (91A35)
This page was built for publication: Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4743532)