The Nonstochastic Multiarmed Bandit Problem

From MaRDI portal
Revision as of 00:15, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:4785631

DOI10.1137/S0097539701398375zbMath1029.68087WikidataQ56560562 ScholiaQ56560562MaRDI QIDQ4785631

Yoav Freund, Robert E. Schapire, Nicolò Cesa-Bianchi, Peter Auer

Publication date: 5 January 2003

Published in: SIAM Journal on Computing (Search for Journal in Brave)




Related Items (only showing first 100 items - show all)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliersCrowdsourcing label quality: a theoretical analysisA linear response bandit problemSetting Reserve Prices in Second-Price Auctions with Unobserved BidsBayesian Exploration: Incentivizing Exploration in Bayesian GamesIntegrated Online Learning and Adaptive Control in Queueing Systems with Uncertain PayoffsRegret bounds for Narendra-Shapiro bandit algorithmsFeel-Good Thompson Sampling for Contextual Bandits and Reinforcement LearningContinuous Assortment Optimization with Logit Choice Probabilities and Incomplete InformationImproved algorithms for bandit with graph feedback via regret decompositionDealing with expert bias in collective decision-makingA Theory of Bounded Inductive RationalityNo-regret algorithms in on-line learning, games and convex optimizationAI-driven liquidity provision in OTC financial marketsRelaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularizationOptimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary RewardsBandit-Based Task Assignment for Heterogeneous CrowdsourcingPer-Round Knapsack-Constrained Linear Submodular BanditsNonstationary Bandits with Habituation and Recovery DynamicsLearning in Combinatorial Optimization: What and How to ExploreUnnamed ItemUnnamed ItemTwo-Armed Restless Bandits with Imperfect Information: Stochastic Control and IndexabilityExplore First, Exploit Next: The True Shape of Regret in Bandit ProblemsNonparametric Self-Adjusting Control for Joint Learning and Optimization of Multiproduct Pricing with Finite Resource CapacityDerivative-free optimization methodsUnnamed ItemPartial Monitoring—Classification, Regret Bounds, and AlgorithmsBypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under RealizabilitySmall-Loss Bounds for Online Learning with Partial InformationA Bandit-Learning Approach to Multifidelity ApproximationThe Nonstochastic Multiarmed Bandit ProblemAchieving Unbounded Resolution inFinitePlayer Goore Games Using Stochastic Automata, and Its ApplicationsReplicator dynamics: old and newOn Learning Algorithms for Nash EquilibriaNo Regret Learning in Oligopolies: Cournot vs. BertrandAlgorithms for computing strategies in two-player simultaneous move gamesOnline learning in online auctionsAdaptive large neighborhood search for mixed integer programmingImproving multi-armed bandit algorithms in online pricing settingsImproved second-order bounds for prediction with expert adviceOnline calibrated forecasts: memory efficiency versus universality for learning in gamesIncentivizing Exploration with Heterogeneous Value of MoneyGlobal Nash convergence of Foster and Young's regret testingBandit online optimization over the permutahedronDiscount Targeting in Online Social Networks Using Backpressure-Based LearningGeneralized mirror descents in congestion gamesLearning Where to Attend with Deep Architectures for Image TrackingUnnamed ItemUnnamed ItemLearning dynamic algorithm portfoliosTwo queues with non-stochastic arrivalsExploration and exploitation of scratch gamesUnnamed ItemUnnamed ItemCombining multiple strategies for multiarmed bandit problems and asymptotic optimalityValue functions for depth-limited solving in zero-sum imperfect-information gamesRegret minimization in online Bayesian persuasion: handling adversarial receiver's types under full and partial feedback modelsAlgorithm portfolio selection as a bandit problem with unbounded lossesNonstochastic Multi-Armed Bandits with Graph-Structured FeedbackUnnamed ItemUnnamed ItemA perpetual search for talents across overlapping generations: a learning processAn asymptotically optimal policy for finite support models in the multiarmed bandit problemOnline multiple kernel classificationPure exploration in finitely-armed and continuous-armed banditsChasing Ghosts: Competing with Stateful PoliciesMulti-channel transmission scheduling with hopping scheme under uncertain channel statesFollowing the Perturbed Leader to Gamble at Multi-armed BanditsCombinatorial banditsLearning with stochastic inputs and adversarial outputsThe \(K\)-armed dueling bandits problemAgent-based Modeling and Simulation of Competitive Wholesale Electricity MarketsOnline Regret Bounds for Markov Decision Processes with Deterministic TransitionsExtracting certainty from uncertainty: regret bounded by variation in costsRegret bounds for sleeping experts and banditsNonstochastic bandits: Countable decision set, unbounded costs and reactive environmentsRegret bounds for restless Markov banditsUCB revisited: improved regret bounds for the stochastic multi-armed bandit problemRegret minimization in repeated matrix games with variable stage durationRandomized prediction of individual sequencesCorruption-tolerant bandit learningA reinforcement learning approach to interval constraint propagationOnline linear optimization and adaptive routingSelective harvesting over networksDynamic pricing with finite price sets: a non-parametric approachFiltered Poisson process bandit on a continuumWorkspace-Based Connectivity Oracle: An Adaptive Sampling Strategy for PRM PlanningUnnamed ItemTune and mix: learning to rank using ensembles of calibrated multi-class classifiersBoostingTree: parallel selection of weak learners in boosting, with application to rankingCompetitive collaborative learningExponential weight algorithm in continuous timeOnline regret bounds for Markov decision processes with deterministic transitionsBayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacksUnnamed ItemMistake bounds on the noise-free multi-armed bandit gameA payoff-based learning procedure and its application to traffic gamesTruthful Mechanisms with Implicit Payment ComputationSequential Shortest Path Interdiction with Incomplete Information



Cites Work


This page was built for publication: The Nonstochastic Multiarmed Bandit Problem