On the two-armed bandit problem with continuous time parameter and discounted rewards
From MaRDI portal
Publication:3786305
DOI10.1080/17442508808833495zbMath0643.90096MaRDI QIDQ3786305
Publication date: 1988
Published in: Stochastics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1080/17442508808833495
continuous-time two-armed bandit; expected discounted reward; stationary optimal policy; Explicit formulae
90C40: Markov and semi-Markov decision processes
Related Items
On the two-armed bandit problem with non-observed Poissonian switching of arms., Average optimality in a Poissonian bandit with switching arms, Good signals gone bad: dynamic signalling with switched effort levels, Learning to disagree in a game of experimentation
Cites Work