The Nonstochastic Multiarmed Bandit Problem

From MaRDI portal
Publication:4785631

DOI10.1137/S0097539701398375zbMath1029.68087WikidataQ56560562 ScholiaQ56560562MaRDI QIDQ4785631

Yoav Freund, Robert E. Schapire, Nicolò Cesa-Bianchi, Peter Auer

Publication date: 5 January 2003

Published in: SIAM Journal on Computing (Search for Journal in Brave)




Related Items

The Nonstochastic Multiarmed Bandit Problem, Achieving Unbounded Resolution inFinitePlayer Goore Games Using Stochastic Automata, and Its Applications, Replicator dynamics: old and new, On Learning Algorithms for Nash Equilibria, No Regret Learning in Oligopolies: Cournot vs. Bertrand, Algorithms for computing strategies in two-player simultaneous move games, Online learning in online auctions, Adaptive large neighborhood search for mixed integer programming, Improving multi-armed bandit algorithms in online pricing settings, Improved second-order bounds for prediction with expert advice, Online calibrated forecasts: memory efficiency versus universality for learning in games, Incentivizing Exploration with Heterogeneous Value of Money, Global Nash convergence of Foster and Young's regret testing, Bandit online optimization over the permutahedron, Discount Targeting in Online Social Networks Using Backpressure-Based Learning, Generalized mirror descents in congestion games, Learning Where to Attend with Deep Architectures for Image Tracking, Unnamed Item, Unnamed Item, Learning dynamic algorithm portfolios, Two queues with non-stochastic arrivals, Exploration and exploitation of scratch games, Unnamed Item, Unnamed Item, Combining multiple strategies for multiarmed bandit problems and asymptotic optimality, Value functions for depth-limited solving in zero-sum imperfect-information games, Regret minimization in online Bayesian persuasion: handling adversarial receiver's types under full and partial feedback models, Algorithm portfolio selection as a bandit problem with unbounded losses, Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback, Unnamed Item, Unnamed Item, A perpetual search for talents across overlapping generations: a learning process, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Online multiple kernel classification, Pure exploration in finitely-armed and continuous-armed bandits, Chasing Ghosts: Competing with Stateful Policies, Multi-channel transmission scheduling with hopping scheme under uncertain channel states, Following the Perturbed Leader to Gamble at Multi-armed Bandits, Combinatorial bandits, Learning with stochastic inputs and adversarial outputs, The \(K\)-armed dueling bandits problem, Agent-based Modeling and Simulation of Competitive Wholesale Electricity Markets, Online Regret Bounds for Markov Decision Processes with Deterministic Transitions, Extracting certainty from uncertainty: regret bounded by variation in costs, Regret bounds for sleeping experts and bandits, Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments, Regret bounds for restless Markov bandits, UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem, Regret minimization in repeated matrix games with variable stage duration, Randomized prediction of individual sequences, Corruption-tolerant bandit learning, A reinforcement learning approach to interval constraint propagation, Online linear optimization and adaptive routing, Selective harvesting over networks, Dynamic pricing with finite price sets: a non-parametric approach, Filtered Poisson process bandit on a continuum, Workspace-Based Connectivity Oracle: An Adaptive Sampling Strategy for PRM Planning, Unnamed Item, Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers, BoostingTree: parallel selection of weak learners in boosting, with application to ranking, Competitive collaborative learning, Exponential weight algorithm in continuous time, Online regret bounds for Markov decision processes with deterministic transitions, Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks, Unnamed Item, Mistake bounds on the noise-free multi-armed bandit game, A payoff-based learning procedure and its application to traffic games, Truthful Mechanisms with Implicit Payment Computation, Sequential Shortest Path Interdiction with Incomplete Information, New bounds on the price of bandit feedback for mistake-bounded online multiclass learning, OPTIMUM ENERGY FOR ENERGY PACKET NETWORKS, Multi-armed bandits based on a variant of simulated annealing, Sequential Interdiction with Incomplete Information and Learning, Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games, A bad arm existence checking problem: how to utilize asymmetric problem structure?, Gorthaur-EXP3: bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma, Computational Randomness from Generalized Hardcore Sets, Reinforcement with Fading Memories, Bayesian Incentive-Compatible Bandit Exploration, Perspectives on multiagent learning, Multi-agent learning for engineers, On the Prior Sensitivity of Thompson Sampling, Pure Exploration in Multi-armed Bandits Problems, Multi-armed bandits with episode context, Online Learning in Markov Decision Processes with Continuous Actions, On the stability of an adaptive learning dynamics in traffic games, Online Learning over a Finite Action Set with Limited Switching, Gittins' theorem under uncertainty, Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit, Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm, Multi-agent reinforcement learning: a selective overview of theories and algorithms, Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning, Robust control of the multi-armed bandit problem, Unnamed Item, Exponential weight approachability, applications to calibration and regret minimization, Reinforcement Learning Based Interactive Agent for Personalized Mathematical Skill Enhancement, Doubly robust policy evaluation and optimization, Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems, On two continuum armed bandit problems in high dimensions, MedleySolver: online SMT algorithm selection, Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers, Crowdsourcing label quality: a theoretical analysis, A linear response bandit problem, Setting Reserve Prices in Second-Price Auctions with Unobserved Bids, Bayesian Exploration: Incentivizing Exploration in Bayesian Games, Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs, Regret bounds for Narendra-Shapiro bandit algorithms, Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning, Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information, Improved algorithms for bandit with graph feedback via regret decomposition, Dealing with expert bias in collective decision-making, A Theory of Bounded Inductive Rationality, No-regret algorithms in on-line learning, games and convex optimization, AI-driven liquidity provision in OTC financial markets, Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization, Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards, Bandit-Based Task Assignment for Heterogeneous Crowdsourcing, Per-Round Knapsack-Constrained Linear Submodular Bandits, Nonstationary Bandits with Habituation and Recovery Dynamics, Learning in Combinatorial Optimization: What and How to Explore, Unnamed Item, Unnamed Item, Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability, Explore First, Exploit Next: The True Shape of Regret in Bandit Problems, Nonparametric Self-Adjusting Control for Joint Learning and Optimization of Multiproduct Pricing with Finite Resource Capacity, Derivative-free optimization methods, Unnamed Item, Partial Monitoring—Classification, Regret Bounds, and Algorithms, Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability, Small-Loss Bounds for Online Learning with Partial Information, A Bandit-Learning Approach to Multifidelity Approximation



Cites Work