Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

From MaRDI portal
Revision as of 01:24, 9 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5396763

DOI10.1561/2200000024zbMath1281.91051DBLPjournals/ftml/BubeckC12arXiv1204.5721OpenAlexW2950929549WikidataQ59538563 ScholiaQ59538563MaRDI QIDQ5396763

Sébastien Bubeck, Nicolò Cesa-Bianchi

Publication date: 3 February 2014

Published in: Foundations and Trends® in Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1204.5721




Related Items (only showing first 100 items - show all)

Functional Sequential Treatment AllocationMulti-Item Nontruthful Auctions Achieve Good RevenueBest Arm Identification for Contaminated BanditsGreedy Algorithm Almost Dominates in Smoothed Contextual BanditsUnnamed ItemBayesian Exploration: Incentivizing Exploration in Bayesian GamesAn Accelerated Method for Derivative-Free Smooth Stochastic Convex OptimizationRanking and Selection with Covariates for Personalized Decision MakingPractical Nonparametric Sampling Strategies for Quantile-Based Ordinal OptimizationAlways Valid Inference: Continuous Monitoring of A/B TestsUnifying mirror descent and dual averagingSelf-adjusting grid networksDaisee: Adaptive importance sampling by balancing exploration and exploitationTechnical note—Knowledge gradient for selection with covariates: Consistency and computationOnline learning for scheduling MIP heuristicsMulti-armed bandit-based hyper-heuristics for combinatorial optimization problemsDistributed online bandit linear regressions with differential privacyDynamic Resource Allocation in the Cloud with Near-Optimal EfficiencyMulti-armed bandit problem with online clustering as side informationNonparametric learning for impulse control problems -- exploration vs. exploitationAsymptotic optimality of myopic ranking and selection proceduresConvergence rate analysis for optimal computing budget allocation algorithmsUser-friendly Introduction to PAC-Bayes BoundsUniversal regression with adversarial responsesOn strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variablesReinforcement Learning, Bit by BitMULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENTControl-data separation and logical condition propagation for efficient inference on probabilistic programsNearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset SelectionConstrained regret minimization for multi-criterion multi-armed banditsTreatment recommendation with distributional targetsOnline Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series AnalysisEfficient and generalizable tuning strategies for stochastic gradient MCMCOptimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary RewardsMNL-Bandit: A Dynamic Learning Approach to Assortment SelectionBandits with Global Convex Constraints and ObjectiveOnline Decision Making with High-Dimensional CovariatesBandit-Based Task Assignment for Heterogeneous CrowdsourcingOnline Network Revenue Management Using Thompson SamplingTractable Sampling Strategies for Ordinal OptimizationA Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief ModelAdaptive Matching for Expert Systems with Uncertain Task TypesOptimal Online Learning for Nonlinear Belief Models Using Discrete PriorsOptimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual BoundsUnnamed ItemRandomized allocation with arm elimination in a bandit problem with covariatesImprovements and Generalizations of Stochastic Knapsack and Markovian Bandits Approximation AlgorithmsExplore First, Exploit Next: The True Shape of Regret in Bandit ProblemsLearning‐based iterative modular adaptive control for nonlinear systemsUnnamed ItemDerivative-free optimization methodsBayesian Uncertainty Directed Trial DesignsDynamic Pricing with Multiple Products and Partially Specified Demand DistributionLearning to Optimize via Posterior SamplingOnline Learning of Nash Equilibria in Congestion GamesNested-Batch-Mode Learning and Stochastic Optimization with An Application to Sequential MultiStage Testing in Materials ScienceUnnamed ItemUnnamed ItemUnnamed ItemLearning in Repeated AuctionsSmall-Loss Bounds for Online Learning with Partial InformationA Primal–Dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory ConstraintSatisficing in Time-Sensitive Bandit LearningGradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noisesOptimal control with learning on the fly: a toy problemApproximation algorithms for stochastic combinatorial optimization problemsStrategic conversations under imperfect information: epistemic message exchange gamesLinUCB applied to Monte Carlo tree searchAdaptive large neighborhood search for mixed integer programmingSmoothness-Adaptive Contextual BanditsSmooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret RegimesSetting Reserve Prices in Second-Price Auctions with Unobserved BidsLearning in auctions: regret is hard, envy is easyEXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRETTracking and Regret Bounds for Online Zeroth-Order Euclidean and Riemannian OptimizationCloud Conveyors System: A Versatile Application for Exploring Cyber-Physical SystemsStochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex caseUnnamed ItemOn minimaxity of follow the leader strategy in the stochastic settingKullback-Leibler upper confidence bounds for optimal sequential allocationDistributed cooperative decision making in multi-agent multi-armed banditsPersonalized optimization with user's feedbackUnnamed ItemWorst-case regret analysis of computationally budgeted online kernel selectionAdaptive-treed banditsUnnamed ItemCombining multiple strategies for multiarmed bandit problems and asymptotic optimalityBudget-limited distribution learning in multifidelity problemsBandit-based Monte-Carlo structure learning of probabilistic logic programsRegret minimization in online Bayesian persuasion: handling adversarial receiver's types under full and partial feedback modelsA quality assuring, cost optimal multi-armed bandit mechanism for expertsourcingIndifference-Zone-Free Selection of the BestNonstochastic Multi-Armed Bandits with Graph-Structured FeedbackUnnamed ItemUnnamed ItemUnnamed ItemTruthful learning mechanisms for multi-slot sponsored search auctions with externalitiesControlling unknown linear dynamics with bounded multiplicative regretEfficient Ranking and Selection in Parallel Computing EnvironmentsMulti-channel transmission scheduling with hopping scheme under uncertain channel states







This page was built for publication: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems