Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal

From MaRDI portal

Publication:4743532

Jump to:navigation, search

DOI10.2307/1426995MaRDI QIDQ4743532zbMATH OpenOpenAlexFDO

Authors Kevin D. Glazebrook

Publication date 1983

Published in Advances in Applied Probability (Search for Journal in Brave)

Full work available at URL https://doi.org/10.2307/1426995

zbMATH Keywords

Gittins index randomised allocation indices mathematical learning multirmed bandit problem

Mathematics Subject Classification ID

Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Decision theory for games (91A35)

This page was built for publication: Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4743532)

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4743532&oldid=19011044"