The multi-armed bandit problem with covariates
From MaRDI portal
(Redirected from Publication:355096)
Abstract: We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably "localized" static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (se) policy. Our results include sharper regret bounds for the se policy in a static bandit problem and minimax optimal regret bounds for the abse policy in the dynamic problem.
Recommendations
- A non-parametric solution to the multi-armed bandit problem with covariates
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Randomized allocation with arm elimination in a bandit problem with covariates
- Kernel estimation and model combination in a bandit problem with covariates
- Bandit and covariate processes, with finite or non-denumerable set of arms
Cites work
- scientific article; zbMATH DE number 3746280 (Why is no real title available?)
- A Note on Performance Limitations in Bandit Problems With Side Information
- A One-Armed Bandit Problem with a Concomitant Variable
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
- An Asymptotic Minimax Theorem for the Two Armed Bandit Problem
- Asymptotically efficient adaptive allocation rules
- Contextual bandits with similarity information
- Fast learning rates for plug-in classifiers
- Finite-time analysis of the multiarmed bandit problem
- Online Learning with Prior Knowledge
- Optimal aggregation of classifiers in statistical learning.
- Prediction, Learning, and Games
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Regret bounds and minimax policies under partial monitoring
- Smooth discrimination analysis
- Some aspects of the sequential design of experiments
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Woodroofe's one-armed bandit problem revisited
Cited in
(42)- Instrument-armed bandits
- One-armed bandit problems with covariates
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- time-decaying bandits for non-stationary problems
- Learning the distribution with largest mean: two bandit frameworks
- Bayesian adaptive bandit-based designs using the Gittins index for multi-armed trials with normally distributed endpoints
- Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
- A Bayesian two-armed bandit model
- Batched bandit problems
- Arbitrary side observations in bandit problems
- Reward maximization under uncertainty: leveraging side-observations on networks
- Online decision making with high-dimensional covariates
- Bandit and covariate processes, with finite or non-denumerable set of arms
- Dynamic assortment personalization in high dimensions
- Technical note—Knowledge gradient for selection with covariates: Consistency and computation
- scientific article; zbMATH DE number 7370520 (Why is no real title available?)
- Statistical inference for online decision making: in a contextual bandit setting
- Nonparametric pricing analytics with customer covariates
- Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
- A linear response bandit problem
- A Single-Index Model With a Surface-Link for Optimizing Individualized Dose Rules
- Online learning of Nash equilibria in congestion games
- Transfer learning for contextual multi-armed bandits
- The \(k\)-nearest neighbour UCB algorithm for multi-armed bandits with covariates
- Randomized allocation with arm elimination in a bandit problem with covariates
- Ranking and Selection with Covariates for Personalized Decision Making
- Treatment recommendation with distributional targets
- A revised approach for risk-averse multi-armed bandits under CVaR criterion
- Kernel estimation and model combination in a bandit problem with covariates
- Infinite Arms Bandit: Optimality via Confidence Bounds
- Multi-armed bandit problem with online clustering as side information
- Smoothness-Adaptive Contextual Bandits
- Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes
- An adaptive multiclass nearest neighbor classifier
- Woodroofe's one-armed bandit problem revisited
- scientific article; zbMATH DE number 6982311 (Why is no real title available?)
- Functional sequential treatment allocation with covariates
- One-armed bandit process with a covariate
- Learning in repeated auctions
- Gaussian process bandits with adaptive discretization
- Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards
- A non-parametric solution to the multi-armed bandit problem with covariates
This page was built for publication: The multi-armed bandit problem with covariates
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q355096)