Asymptotically efficient adaptive allocation rules

From MaRDI portal
Publication:1060517

DOI10.1016/0196-8858(85)90002-8zbMath0568.62074OpenAlexW2009551863WikidataQ56675673 ScholiaQ56675673MaRDI QIDQ1060517

Herbert Robbins, Tze Leung Lai

Publication date: 1985

Published in: Advances in Applied Mathematics (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/0196-8858(85)90002-8



Related Items

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers, Functional Sequential Treatment Allocation, Best Arm Identification for Contaminated Bandits, Adaptive strategies in Kelly’s horse races model, Bounded Regret for Finitely Parameterized Multi-Armed Bandits, A linear response bandit problem, Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes, Optimistic Gittins Indices, EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET, Functional Feature Construction for Individualized Treatment Regimes, MULTI-ARMED BANDITS WITH COVARIATES:THEORY AND APPLICATIONS, Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials, Bayesian Exploration: Incentivizing Exploration in Bayesian Games, Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs, Dynamic Inventory Control with Fixed Setup Costs and Unknown Discrete Demand Distribution, Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information, Daisee: Adaptive importance sampling by balancing exploration and exploitation, Unnamed Item, Hypothesis testing in adaptively sampled data: ART to maximize power beyond \textit{iid }sampling, Encounters with Martingales in Statistics and Stochastic Optimization, Online learning of network bottlenecks via minimax paths, Multi-armed bandit problem with online clustering as side information, Variable Selection Via Thompson Sampling, Convergence rate analysis for optimal computing budget allocation algorithms, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Reinforcement Learning, Bit by Bit, MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT, ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT, Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection, Constrained regret minimization for multi-criterion multi-armed bandits, Asymptotic optimality for decentralised bandits, Treatment recommendation with distributional targets, Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards, Settling the sample complexity of model-based offline reinforcement learning, Multi-armed linear bandits with latent biases, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Learning the distribution with largest mean: two bandit frameworks, Finite-Time Analysis for the Knowledge-Gradient Policy, MNL-Bandit: A Dynamic Learning Approach to Assortment Selection, Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning, Per-Round Knapsack-Constrained Linear Submodular Bandits, A reinforcement learning approach to personalized learning recommendation systems, Dynamic Inventory and Price Controls Involving Unknown Demand on Discrete Nonperishable Items, Adaptive Matching for Expert Systems with Uncertain Task Types, Nonstationary Bandits with Habituation and Recovery Dynamics, Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors, Learning in Combinatorial Optimization: What and How to Explore, Simple Bayesian Algorithms for Best-Arm Identification, An Approximation Approach for Response-Adaptive Clinical Trial Design, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Infinite Arms Bandit: Optimality via Confidence Bounds, Multi-Armed Bandit for Species Discovery: A Bayesian Nonparametric Approach, Sequential Shortest Path Interdiction with Incomplete Information, Randomized allocation with arm elimination in a bandit problem with covariates, Learning to Optimize via Information-Directed Sampling, Explore First, Exploit Next: The True Shape of Regret in Bandit Problems, Nonparametric Self-Adjusting Control for Joint Learning and Optimization of Multiproduct Pricing with Finite Resource Capacity, On Incomplete Learning and Certainty-Equivalence Control, Sequential Generalized Likelihood Ratios and Adaptive Treatment Allocation for Optimal Sequential Selection, Derivative-free optimization methods, Generalized Bandit Problems, Pure Exploration in Multi-armed Bandits Problems, Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models, Game of Thrones: Fully Distributed Learning for Multiplayer Bandits, Gittins Index for Simple Family of Markov Bandit Processes with Switching Cost and No Discounting, Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents, Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach, Matching While Learning, Learning to Optimize via Posterior Sampling, Sequential design with applications to the trim-loss problem, Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint, Small-Loss Bounds for Online Learning with Partial Information, A Bandit-Learning Approach to Multifidelity Approximation, Satisficing in Time-Sensitive Bandit Learning, Model-based Reinforcement Learning: A Survey, Optimal learning and experimentation in bandit problems., Algorithm portfolios for noisy optimization, A non-parametric solution to the multi-armed bandit problem with covariates, Batched bandit problems, The time until the final zero crossing of random sums with application to nonparametric bandit theory, Optimal control with learning on the fly: a toy problem, Woodroofe's one-armed bandit problem revisited, Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search, Infomax strategies for an optimal balance between exploration and exploitation, Bandit and covariate processes, with finite or non-denumerable set of arms, Response-adaptive designs for clinical trials: simultaneous learning from multiple patients, Improving multi-armed bandit algorithms in online pricing settings, Incentivizing Exploration with Heterogeneous Value of Money, Control problems in online advertising and benefits of randomized bidding strategies, On Monte Carlo tree search for weighted vertex coloring, The multi-armed bandit problem with covariates, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Exploration and exploitation of scratch games, Distributed cooperative decision making in multi-agent multi-armed bandits, Two-armed bandit problem for parallel data processing systems, The multi-armed bandit problem: an efficient nonparametric solution, An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems, Adaptive aggregation for reinforcement learning in average reward Markov decision processes, Robustness of stochastic bandit policies, General time consistent discounting, Input perturbations for adaptive control and learning, On adaptive linear-quadratic regulators, Primal-Dual Algorithms for Optimization with Stochastic Dominance, Bayesian policy reuse, A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing, On the Convergence Rates of Expected Improvement Methods, Good arm identification via bandit feedback, A perpetual search for talents across overlapping generations: a learning process, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Pure exploration in finitely-armed and continuous-armed bandits, Tuning Bandit Algorithms in Stochastic Environments, The \(K\)-armed dueling bandits problem, Gaussian two-armed bandit: limiting description, Online Regret Bounds for Markov Decision Processes with Deterministic Transitions, Active Learning in Multi-armed Bandits, An optimal bidimensional multi-armed bandit auction for multi-unit procurement, Analyzing bandit-based adaptive operator selection mechanisms, Regret bounds for sleeping experts and bandits, A unified framework for stochastic optimization, Modeling item-item similarities for personalized recommendations on Yahoo! front page, Regret bounds for restless Markov bandits, Near-optimal PAC bounds for discounted MDPs, UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem, Adaptive sensing performance lower bounds for sparse signal detection and support estimation, Optimal strategies for a class of sequential control problems with precedence relations, Randomized prediction of individual sequences, Online linear optimization and adaptive routing, Ballooning multi-armed bandits, How fragile are information cascades?, On Bayesian index policies for sequential resource allocation, Reward-Modulated Hebbian Learning of Decision Making, Clustering in block Markov chains, Unnamed Item, MSO: a framework for bound-constrained black-box global optimization algorithms, Arbitrary side observations in bandit problems, Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards, Online regret bounds for Markov decision processes with deterministic transitions, Active learning in heteroscedastic noise, Boundary crossing probabilities for general exponential families, An online algorithm for the risk-aware restless bandit, Truthful Mechanisms with Implicit Payment Computation, Response adaptive designs that incorporate switching costs and constraints, Adaptive policies for perimeter surveillance problems, A conversation with Tze Leung Lai, Stochastic approximation: from statistical origin to big-data, multidisciplinary applications, Maximin effects in inhomogeneous large-scale data, Choosing a good toolkit. II: Bayes-rule based heuristics, On modification of population-based search algorithms for convergence in stochastic combinatorial optimization, Mechanisms with learning for stochastic multi-armed bandit problems, Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives, Certainty equivalence control with forcing: Revisited, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback, Dynamic Assortment Personalization in High Dimensions, Bayesian Incentive-Compatible Bandit Exploration, A revised approach for risk-averse multi-armed bandits under CVaR criterion, Online Collaborative Filtering on Graphs, Optimal allocation of simulation experiments in discrete stochastic optimization and approximative algorithms, On the Prior Sensitivity of Thompson Sampling, Bandit algorithms to personalize educational chatbots, Gittins' theorem under uncertainty, Nonparametric Bayesian multiarmed bandits for single-cell experiment design, Two-armed bandit problem and batch version of the mirror descent algorithm, An asymptotically optimal strategy for constrained multi-armed bandit problems, Asymptotically optimal algorithms for budgeted multiple play bandits, Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm, Asymptotically efficient strategies for a stochastic scheduling problem with order constraints., Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates, Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning, Robust control of the multi-armed bandit problem, Matrices -- compensating the loss of anschauung, Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems, Nonparametric bandit methods, Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems, Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges



Cites Work