The learning component of dynamic allocation indices
From MaRDI portal
Publication:1206729
DOI10.1214/aos/1176348788zbMath0760.62080MaRDI QIDQ1206729
Publication date: 1 April 1993
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1214/aos/1176348788
Gittins index; exponential discounting; normal distributions; multiarmed bandit problem; Bernoulli reward process; dynamical allocation index; expected immediate reward; learning component; optimal allocation rule; target processes; upward adjustment
62C10: Bayesian problems; characterization of Bayes procedures
62L99: Sequential statistical methods
93E20: Optimal stochastic control
90C40: Markov and semi-Markov decision processes
62L05: Sequential statistical design
Related Items
Generalized two-stage bandit problem, The prediction distribution for the heteroscedastic multivariate lineary models, Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges, One-armed bandit process with a covariate, Small-sample performance of Bernoulli two-armed bandit Bayesian strategies, Bandit bounds from stochastic variability extrema, Optimal learning with non-Gaussian rewards