Finite-time analysis of the multiarmed bandit problem

From MaRDI portal
Publication:5959973

DOI10.1023/A:1013689704352zbMath1012.68093WikidataQ56675670 ScholiaQ56675670MaRDI QIDQ5959973

Nicolò Cesa-Bianchi, Peter Auer, Paul Fischer

Publication date: 11 April 2002

Published in: Machine Learning (Search for Journal in Brave)




Related Items (showing only first 100 - show all)

General game playing with stochastic CSPA non-parametric solution to the multi-armed bandit problem with covariatesBatched bandit problemsOnline machine learning algorithms to optimize performances of complex wireless communication systemsAlgorithms for computing strategies in two-player simultaneous move gamesOptimal control with learning on the fly: a toy problemAn analysis for strength improvement of an MCTS-based program playing Chinese dark chessModification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree searchLinUCB applied to Monte Carlo tree searchInfomax strategies for an optimal balance between exploration and exploitationCrowdsourcing with unsure optionUsing reinforcement learning to find an optimal set of featuresAdaptive large neighborhood search for mixed integer programmingResponse-adaptive designs for clinical trials: simultaneous learning from multiple patientsImproving multi-armed bandit algorithms in online pricing settingsControl problems in online advertising and benefits of randomized bidding strategiesThe multi-armed bandit problem with covariatesWisdom of crowds versus groupthink: learning in groups and in isolationKullback-Leibler upper confidence bounds for optimal sequential allocationExploration and exploitation of scratch gamesHypervolume indicator and dominance reward based multi-objective Monte-Carlo tree searchLearning to steer nonlinear interior-point methodsThe multi-armed bandit problem: an efficient nonparametric solutionAdaptive aggregation for reinforcement learning in average reward Markov decision processesRobustness of stochastic bandit policiesAdaptive-treed banditsAn artificial bee colony algorithm for the job shop scheduling problem with random processing timesCombining multiple strategies for multiarmed bandit problems and asymptotic optimalityOn the probability of correct selection in ordinal comparison over dynamic networksBayesian policy reuseBandit-based Monte-Carlo structure learning of probabilistic logic programsA quality assuring, cost optimal multi-armed bandit mechanism for expertsourcingA perpetual search for talents across overlapping generations: a learning processAn asymptotically optimal policy for finite support models in the multiarmed bandit problemTruthful learning mechanisms for multi-slot sponsored search auctions with externalitiesA comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problemsTemporal-difference search in Computer GoPreference-based reinforcement learning: a formal framework and a policy iteration algorithmThe \(K\)-armed dueling bandits problemA reliability-aware multi-armed bandit approach to learn and select users in demand responseOptimal Bayesian strategies for the infinite-armed Bernoulli banditAn optimal bidimensional multi-armed bandit auction for multi-unit procurementA dynamic programming strategy to balance exploration and exploitation in the bandit problemAnalyzing bandit-based adaptive operator selection mechanismsRegret bounds for sleeping experts and banditsA pricing problem with unknown arrival rate and price sensitivityModeling item-item similarities for personalized recommendations on Yahoo! front pageInformation capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden informationRegret bounds for restless Markov banditsUCB revisited: improved regret bounds for the stochastic multi-armed bandit problemA Monte Carlo tree search approach to finding efficient patrolling schemes on graphsMarkov decision processes with sequential sensor measurementsCorruption-tolerant bandit learningAnytime discovery of a diverse set of patterns with Monte Carlo tree searchBallooning multi-armed banditsDynamic pricing with finite price sets: a non-parametric approachMachine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-artBatch repair actions for automated troubleshootingOn Bayesian index policies for sequential resource allocationLatest stored information based adaptive selection strategy for multiobjective evolutionary algorithmA methodology for determining an effective subset of heuristics in selection hyper-heuristicsComparison of Kriging-based algorithms for simulation optimization with heterogeneous noiseA hybrid breakout local search and reinforcement learning approach to the vertex separator problemMSO: a framework for bound-constrained black-box global optimization algorithmsBoostingTree: parallel selection of weak learners in boosting, with application to rankingEffective deadlock resolution with self-interested partially-rational agentsRandomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewardsSampled fictitious play for approximate dynamic programmingEfficient crowdsourcing of unknown experts using bounded multi-armed banditsOnline regret bounds for Markov decision processes with deterministic transitionsActive learning in heteroscedastic noiseBoundary crossing probabilities for general exponential familiesAn online algorithm for the risk-aware restless banditResponse adaptive designs that incorporate switching costs and constraintsA multi-objective Monte Carlo tree search for forest harvest schedulingRollout sampling approximate policy iterationAdaptive policies for perimeter surveillance problemsA survey of network interdiction models and algorithmsAnalysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move gamesA bad arm existence checking problem: how to utilize asymmetric problem structure?Exploration-exploitation tradeoff using variance estimates in multi-armed banditsNeural precedence recommenderEnhancing gene expression programming based on space partition and jump for symbolic regressionA revised approach for risk-averse multi-armed bandits under CVaR criterionMulti-armed bandits with episode contextRegret lower bound and optimal algorithm for high-dimensional contextual linear banditNonparametric Bayesian multiarmed bandits for single-cell experiment designTwo-armed bandit problem and batch version of the mirror descent algorithmMulti-objective simultaneous optimistic optimizationDismemberment and design for controlling the replication variance of regret for the multi-armed banditAn asymptotically optimal strategy for constrained multi-armed bandit problemssampling based automatic modulation classifierStochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithmMulti-agent reinforcement learning: a selective overview of theories and algorithmsFairness in learning-based sequential decision algorithms: a surveyTrading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learningThe pure exploration problem with general reward functions depending on full distributionsExploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problemsMulti-armed bandit models for the optimal design of clinical trials: benefits and challengesOn two continuum armed bandit problems in high dimensions




This page was built for publication: Finite-time analysis of the multiarmed bandit problem