Asymptotically efficient adaptive allocation rules

Statistical ranking and selection procedures (62F07) Sequential statistical design (62L05) Sequential statistical analysis (62L10) Robustness and adaptive procedures (parametric inference) (62F35)

Recommendations

scientific article; zbMATH DE number 3947447
scientific article; zbMATH DE number 410134
scientific article; zbMATH DE number 3938351
Adaptive treatment allocation and the multi-armed bandit problem
Irreversible adaptive allocation rules

Cites work

Asymptotic near admissimlity and asymptotic near optimality by the “two armed bandit” problem
Some One-Sided Theorems on the Tail Distribution of Sample Sums with Applications to the Last Time and Largest Excess of Boundary Crossings
Some aspects of the sequential design of experiments

Cited in

(only showing first 100 items - show all)

Robustness of stochastic bandit policies
Dynamic sampling allocation and design selection
Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges
Bounded Regret for Finitely Parameterized Multi-Armed Bandits
Primal-dual algorithms for optimization with stochastic dominance
scientific article; zbMATH DE number 7513920 (Why is no real title available?)
Pure exploration in multi-armed bandits problems
Online learning and pricing for multiple products with reference price effects
On the convergence rates of expected improvement methods
Online learning for route planning with on-time arrival reliability
Sequential design with applications to the trim-loss problem
On the Prior Sensitivity of Thompson Sampling
Matrices -- compensating the loss of anschauung
scientific article; zbMATH DE number 410134 (Why is no real title available?)
Stochastic approximation: from statistical origin to big-data, multidisciplinary applications
Optimal control with learning on the fly: a toy problem
Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
scientific article; zbMATH DE number 7370545 (Why is no real title available?)
Two-armed bandit problem for parallel data processing systems
Reinforcement Learning, Bit by Bit
Learning the distribution with largest mean: two bandit frameworks
Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models
Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems
Per-round knapsack-constrained linear submodular bandits
Thompson sampling for networked control over unknown channels
Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
Algorithm portfolios for noisy optimization
On Monte Carlo tree search for weighted vertex coloring
Branching time active inference: the theory and its generality
Pure exploration in finitely-armed and continuous-armed bandits
Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis
Nonstationary bandits with habituation and recovery dynamics
The multi-armed bandit problem under the mean-variance setting
Gittins Index for Simple Family of Markov Bandit Processes with Switching Cost and No Discounting
Simple fixes that accommodate switching costs in multi-armed bandits
Adjusted expected improvement for cumulative regret minimization in noisy Bayesian optimization
Efficient Sorting in a Dynamic Adverse-Selection Model
A new regret-analysis framework for budgeted multi-armed bandits
Generalizing the regret: an analysis of lower and upper bounds
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Batched bandit problems
Learning in combinatorial optimization: what and how to explore
Certainty equivalence control with forcing: Revisited
The multi-armed bandit problem: an efficient nonparametric solution
Learning algorithms for verification of Markov decision processes
The multi-armed bandit problem with covariates
Functional feature construction for individualized treatment regimes
Kullback-Leibler upper confidence bounds for optimal sequential allocation
Response adaptive designs that incorporate switching costs and constraints
Learning to optimize via information-directed sampling
Optimal strategies for a class of sequential control problems with precedence relations
Online collaborative filtering on graphs
Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty
Arbitrary side observations in bandit problems
Close the gaps: a learning-while-doing algorithm for single-product revenue management problems
Stochastic Low-Rank Tensor Bandits for Multi-Dimensional Online Decision Making
Profile-based bandit with unknown profiles
Reward maximization under uncertainty: leveraging side-observations on networks
An asymptotically optimal strategy for constrained multi-armed bandit problems
Asymptotic optimality for decentralised bandits
Nonparametric bandit methods
Optimal allocation of simulation experiments in discrete stochastic optimization and approximative algorithms
Efficient allocations under ambiguity
Small-Loss Bounds for Online Learning with Partial Information
Generalized Bandit Problems
Bandit and covariate processes, with finite or non-denumerable set of arms
UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
Ballooning multi-armed bandits
How fragile are information cascades?
On Bayesian index policies for sequential resource allocation
Adaptive policies for perimeter surveillance problems
Boundary crossing probabilities for general exponential families
An optimal bidimensional multi-armed bandit auction for multi-unit procurement
Exploration-exploitation policies with almost sure, arbitrarily slow growing asymptotic regret
Irreversible adaptive allocation rules
Satisficing in Time-Sensitive Bandit Learning
A reinforcement learning approach to personalized learning recommendation systems
Asymptotic efficiency of a seqrential allocation rule
Reading policies for joins: an asymptotic analysis
Dynamic assortment personalization in high dimensions
Daisee: Adaptive importance sampling by balancing exploration and exploitation
Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control
Tracking the mean of a piecewise stationary sequence
The \(K\)-armed dueling bandits problem
Optimal regret bounds for collaborative learning in bandits
CRIMED: lower and upper bounds on regret for bandits with unbounded stochastic corruption
On incomplete learning and certainty-equivalence control
Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search
Infomax strategies for an optimal balance between exploration and exploitation
Adaptive matching for expert systems with uncertain task types
Integrated online learning and adaptive control in queueing systems with uncertain payoffs
Adversarial online multi-task reinforcement learning
Complexity analysis of a countable-armed bandit problem
Follow-the-perturbed-leader achieves best-of-both-worlds for bandit problems
Online learning for traffic navigation in congested networks
On best-arm identification with a fixed budget in non-parametric multi-armed bandits
Robust control of the multi-armed bandit problem
Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers
Functional Sequential Treatment Allocation

This page was built for publication: Asymptotically efficient adaptive allocation rules

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1060517)