Adaptive treatment allocation and the multi-armed bandit problem

From MaRDI portal

Publication:1102059

Jump to:navigation, search

DOI10.1214/aos/1176350495zbMath0643.62054OpenAlexW1973885534MaRDI QIDQ1102059

Tze Leung Lai

Publication date: 1987

Published in: The Annals of Statistics (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1214/aos/1176350495

zbMATH Keywords

adaptive control boundary crossing simulation study dynamic allocation asymptotic optimality upper confidence bounds multi-armed bandit problem adaptive treatment allocation

Mathematics Subject Classification ID

Stopping times; optimal stopping problems; gambling theory (60G40) Sequential statistical design (62L05) Sequential estimation (62L12)

Related Items

Optimal learning and experimentation in bandit problems., A non-parametric solution to the multi-armed bandit problem with covariates, Infomax strategies for an optimal balance between exploration and exploitation, Bandit and covariate processes, with finite or non-denumerable set of arms, A linear response bandit problem, Optimistic Gittins Indices, MULTI-ARMED BANDITS WITH COVARIATES:THEORY AND APPLICATIONS, The multi-armed bandit problem: an efficient nonparametric solution, Unnamed Item, Encounters with Martingales in Statistics and Stochastic Optimization, Reinforcement Learning, Bit by Bit, Asymptotic optimality theory for active quickest detection with unknown postchange parameters, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, Poissonian two-armed bandit: a new approach, Customization of J. Bather's UCB strategy for a Gaussian multiarmed bandit, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Optimal Bayesian strategies for the infinite-armed Bernoulli bandit, Undiscounted bandit games, Unnamed Item, Optimal strategies for a class of sequential control problems with precedence relations, Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors, An Approximation Approach for Response-Adaptive Clinical Trial Design, Unnamed Item, On Bayesian index policies for sequential resource allocation, Efficient Adaptive Randomization and Stopping Rules in Multi-arm Clinical Trials for Testing a New Treatment, Unnamed Item, An analysis of model-based interval estimation for Markov decision processes, Optimal stopping for Brownian motion with applications to sequential analysis and option pricing, Unnamed Item, On the optimal amount of experimentation in sequential decision problems, Unnamed Item, Infinite Arms Bandit: Optimality via Confidence Bounds, Boundary crossing probabilities for general exponential families, An online algorithm for the risk-aware restless bandit, Stochastic approximation: from statistical origin to big-data, multidisciplinary applications, Learning to Optimize via Information-Directed Sampling, Sequential Generalized Likelihood Ratios and Adaptive Treatment Allocation for Optimal Sequential Selection, The Valuator’s Curse: Decision Analysis of Overvaluation and Disappointment in Acquisition, Learning to Optimize via Posterior Sampling, Asymptotically optimal algorithms for budgeted multiple play bandits, Small-sample performance of Bernoulli two-armed bandit Bayesian strategies, Matrices -- compensating the loss of anschauung, Nonparametric bandit methods

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1102059&oldid=13139505"