Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards
From MaRDI portal
Abstract: We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization to select the arms, the proposed strategy is shown to be strongly consistent.
Recommendations
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Randomized allocation with arm elimination in a bandit problem with covariates
- Kernel estimation and model combination in a bandit problem with covariates
- The multi-armed bandit problem with covariates
- A non-parametric solution to the multi-armed bandit problem with covariates
Cites work
- scientific article; zbMATH DE number 4078557 (Why is no real title available?)
- scientific article; zbMATH DE number 3638998 (Why is no real title available?)
- A One-Armed Bandit Problem with a Concomitant Variable
- A Tutorial on Thompson Sampling
- Asymptotically efficient adaptive allocation rules
- Bandit algorithms
- Contextual bandits with similarity information
- Finite-time analysis of the multiarmed bandit problem
- On sequential decision problems with delayed observations
- One-armed bandit problems with covariates
- Prediction, Learning, and Games
- Randomized allocation with arm elimination in a bandit problem with covariates
- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Reinforcement learning. An introduction
- Sequential Analysis with Delayed Observations
- Some aspects of the sequential design of experiments
- The multi-armed bandit problem with covariates
Cited in
(9)- Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
- Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
- A bandit process with delayed responses
- Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
- Delay-Adaptive Learning in Generalized Linear Contextual Bandits
- Randomized allocation with arm elimination in a bandit problem with covariates
- Kernel estimation and model combination in a bandit problem with covariates
- Integrating multi-armed bandit with local search for MaxSAT
- Bernoulli multi-armed bandit problem under delayed feedback
This page was built for publication: Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2006767)