Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards

DOI10.1109/TAC.1987.1104485zbMath0639.93053OpenAlexW4232620022MaRDI QIDQ3780858

Jean Walrand, V. Anantharam, Pravin P. Varaiya

Publication date: 1987

Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1109/tac.1987.1104485

zbMATH Keywords

regret function multiarmed bandit learning scheme Markovian rewards

Mathematics Subject Classification ID

Adaptive control/observation systems (93C40) Estimation and detection in stochastic control theory (93E10) Stochastic games, stochastic differential games (91A15) Stochastic systems in control theory (general) (93E03) Continuous-time Markov processes on discrete state spaces (60J27) Probabilistic games; gambling (91A60)

Related Items (8)

A perpetual search for talents across overlapping generations: a learning process ⋮ Regret bounds for restless Markov bandits ⋮ Nonstationary Bandits with Habituation and Recovery Dynamics ⋮ Arbitrary side observations in bandit problems ⋮ Efficient crowdsourcing of unknown experts using bounded multi-armed bandits ⋮ Certainty equivalence control with forcing: Revisited ⋮ Asymptotically efficient strategies for a stochastic scheduling problem with order constraints. ⋮ A Bandit-Learning Approach to Multifidelity Approximation

This page was built for publication: Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards