Multi-armed bandit processes with optimal selection of the operating times (Q2387146)

A multi-armed Bandit Problem is considered such that at each decision epoch it is to be decided the next project to be undertaken and the span of time to be spent in this project, instead of reconsidering the new project at each stage. This extended model, inspired in sequentially planned decision procedures [\textit{W. Schmitz} ``Optimal sequentially planned decision procedures. Lect. Notes Stat. 79. New York: Springer-Verlag (1993; Zbl 0771.62057)], is formulated in Section 1 and tries to exploit the reduction of costs produced by longer periods dedicated to the same activity. Following the method by \textit{P. Whittle} [J. R. Stat. Soc., Ser. B. 42, 143--149 (1980; Zbl 0439.90096), Section 2 introduces a retirement option with a variable reward \(M\), and Section 3 extends Gittins indexes to this case. Another relevant conclusion is that the optimal period of activity for each project does not depend on the retirement reward \(M\). Finally, we show that the optimal strategy is to choose the project with the highest Gittins index.

0 references

zbMATH Keywords

multi-armed bandit processes

0 references

Gittins index

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Multi-armed bandits with switching penalties

0 references