Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards
From MaRDI portal
Publication:5113912
DOI10.1287/stsy.2019.0033zbMath1447.93371arXiv1405.3316OpenAlexW2962821829WikidataQ126855665 ScholiaQ126855665MaRDI QIDQ5113912
Yonatan Gur, Assaf J. Zeevi, Omar Besbes
Publication date: 18 June 2020
Published in: Stochastic Systems (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1405.3316
Related Items (7)
Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Fully probabilistic design of strategies with estimator ⋮ Setting Reserve Prices in Second-Price Auctions with Unobserved Bids ⋮ Lipschitzness is all you need to tame off-policy generative adversarial imitation learning ⋮ Unnamed Item ⋮ Model-based preference quantification ⋮ Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks
Uses Software
Cites Work
- Regret bounds for restless Markov bandits
- An analog of the minimax theorem for vector payoffs
- Asymptotically efficient adaptive allocation rules
- Arm-acquiring bandits
- A decision-theoretic generalization of on-line learning and an application to boosting
- Regret in the on-line decision problem
- Non-Stationary Stochastic Optimization
- On Upper-Confidence Bound Policies for Switching Bandit Problems
- Dynamic Assortment with Demand Learning for Seasonal Consumer Goods
- Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic
- The Nonstochastic Multiarmed Bandit Problem
- Learning and Strategic Pricing
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
- Prediction, Learning, and Games
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards