Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

From MaRDI portal

Revision as of 01:24, 9 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5396763

Jump to:navigation, search

DOI10.1561/2200000024zbMath1281.91051DBLPjournals/ftml/BubeckC12arXiv1204.5721OpenAlexW2950929549WikidataQ59538563 ScholiaQ59538563MaRDI QIDQ5396763

Sébastien Bubeck, Nicolò Cesa-Bianchi

Publication date: 3 February 2014

Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1204.5721

zbMATH Keywords

optimization online learning reinforcement learning game-theoretic learning learning and statistical methods

Mathematics Subject Classification ID

Minimax procedures in statistical decision theory (62C20) Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to game theory, economics, and finance (91-02) Markov and semi-Markov decision processes (90C40) Sequential statistical design (62L05) Rationality and learning in game theory (91A26) Probabilistic games; gambling (91A60) Decision theory for games (91A35)

Related Items (only showing first 100 items - show all)

Functional Sequential Treatment Allocation ⋮ Multi-Item Nontruthful Auctions Achieve Good Revenue ⋮ Best Arm Identification for Contaminated Bandits ⋮ Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits ⋮ Unnamed Item ⋮ Bayesian Exploration: Incentivizing Exploration in Bayesian Games ⋮ An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization ⋮ Ranking and Selection with Covariates for Personalized Decision Making ⋮ Practical Nonparametric Sampling Strategies for Quantile-Based Ordinal Optimization ⋮ Always Valid Inference: Continuous Monitoring of A/B Tests ⋮ Unifying mirror descent and dual averaging ⋮ Self-adjusting grid networks ⋮ Daisee: Adaptive importance sampling by balancing exploration and exploitation ⋮ Technical note—Knowledge gradient for selection with covariates: Consistency and computation ⋮ Online learning for scheduling MIP heuristics ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Distributed online bandit linear regressions with differential privacy ⋮ Dynamic Resource Allocation in the Cloud with Near-Optimal Efficiency ⋮ Multi-armed bandit problem with online clustering as side information ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ Asymptotic optimality of myopic ranking and selection procedures ⋮ Convergence rate analysis for optimal computing budget allocation algorithms ⋮ User-friendly Introduction to PAC-Bayes Bounds ⋮ Universal regression with adversarial responses ⋮ On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables ⋮ Reinforcement Learning, Bit by Bit ⋮ MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT ⋮ Control-data separation and logical condition propagation for efficient inference on probabilistic programs ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Constrained regret minimization for multi-criterion multi-armed bandits ⋮ Treatment recommendation with distributional targets ⋮ Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis ⋮ Efficient and generalizable tuning strategies for stochastic gradient MCMC ⋮ Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards ⋮ MNL-Bandit: A Dynamic Learning Approach to Assortment Selection ⋮ Bandits with Global Convex Constraints and Objective ⋮ Online Decision Making with High-Dimensional Covariates ⋮ Bandit-Based Task Assignment for Heterogeneous Crowdsourcing ⋮ Online Network Revenue Management Using Thompson Sampling ⋮ Tractable Sampling Strategies for Ordinal Optimization ⋮ A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model ⋮ Adaptive Matching for Expert Systems with Uncertain Task Types ⋮ Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ⋮ Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds ⋮ Unnamed Item ⋮ Randomized allocation with arm elimination in a bandit problem with covariates ⋮ Improvements and Generalizations of Stochastic Knapsack and Markovian Bandits Approximation Algorithms ⋮ Explore First, Exploit Next: The True Shape of Regret in Bandit Problems ⋮ Learning‐based iterative modular adaptive control for nonlinear systems ⋮ Unnamed Item ⋮ Derivative-free optimization methods ⋮ Bayesian Uncertainty Directed Trial Designs ⋮ Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution ⋮ Learning to Optimize via Posterior Sampling ⋮ Online Learning of Nash Equilibria in Congestion Games ⋮ Nested-Batch-Mode Learning and Stochastic Optimization with An Application to Sequential MultiStage Testing in Materials Science ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Learning in Repeated Auctions ⋮ Small-Loss Bounds for Online Learning with Partial Information ⋮ A Primal–Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises ⋮ Optimal control with learning on the fly: a toy problem ⋮ Approximation algorithms for stochastic combinatorial optimization problems ⋮ Strategic conversations under imperfect information: epistemic message exchange games ⋮ LinUCB applied to Monte Carlo tree search ⋮ Adaptive large neighborhood search for mixed integer programming ⋮ Smoothness-Adaptive Contextual Bandits ⋮ Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes ⋮ Setting Reserve Prices in Second-Price Auctions with Unobserved Bids ⋮ Learning in auctions: regret is hard, envy is easy ⋮ EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Tracking and Regret Bounds for Online Zeroth-Order Euclidean and Riemannian Optimization ⋮ Cloud Conveyors System: A Versatile Application for Exploring Cyber-Physical Systems ⋮ Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case ⋮ Unnamed Item ⋮ On minimaxity of follow the leader strategy in the stochastic setting ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Distributed cooperative decision making in multi-agent multi-armed bandits ⋮ Personalized optimization with user's feedback ⋮ Unnamed Item ⋮ Worst-case regret analysis of computationally budgeted online kernel selection ⋮ Adaptive-treed bandits ⋮ Unnamed Item ⋮ Combining multiple strategies for multiarmed bandit problems and asymptotic optimality ⋮ Budget-limited distribution learning in multifidelity problems ⋮ Bandit-based Monte-Carlo structure learning of probabilistic logic programs ⋮ Regret minimization in online Bayesian persuasion: handling adversarial receiver's types under full and partial feedback models ⋮ A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing ⋮ Indifference-Zone-Free Selection of the Best ⋮ Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Truthful learning mechanisms for multi-slot sponsored search auctions with externalities ⋮ Controlling unknown linear dynamics with bounded multiplicative regret ⋮ Efficient Ranking and Selection in Parallel Computing Environments ⋮ Multi-channel transmission scheduling with hopping scheme under uncertain channel states

This page was built for publication: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5396763&oldid=20125650"