Finite-time analysis of the multiarmed bandit problem
From MaRDI portal
Publication:5959973
DOI10.1023/A:1013689704352zbMath1012.68093WikidataQ56675670 ScholiaQ56675670MaRDI QIDQ5959973
Nicolò Cesa-Bianchi, Peter Auer, Paul Fischer
Publication date: 11 April 2002
Published in: Machine Learning (Search for Journal in Brave)
Computational learning theory (68Q32) Learning and adaptive systems in artificial intelligence (68T05)
Related Items (showing only first 100 - show all)
General game playing with stochastic CSP ⋮ A non-parametric solution to the multi-armed bandit problem with covariates ⋮ Batched bandit problems ⋮ Online machine learning algorithms to optimize performances of complex wireless communication systems ⋮ Algorithms for computing strategies in two-player simultaneous move games ⋮ Optimal control with learning on the fly: a toy problem ⋮ An analysis for strength improvement of an MCTS-based program playing Chinese dark chess ⋮ Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search ⋮ LinUCB applied to Monte Carlo tree search ⋮ Infomax strategies for an optimal balance between exploration and exploitation ⋮ Crowdsourcing with unsure option ⋮ Using reinforcement learning to find an optimal set of features ⋮ Adaptive large neighborhood search for mixed integer programming ⋮ Response-adaptive designs for clinical trials: simultaneous learning from multiple patients ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ Control problems in online advertising and benefits of randomized bidding strategies ⋮ The multi-armed bandit problem with covariates ⋮ Wisdom of crowds versus groupthink: learning in groups and in isolation ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Exploration and exploitation of scratch games ⋮ Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search ⋮ Learning to steer nonlinear interior-point methods ⋮ The multi-armed bandit problem: an efficient nonparametric solution ⋮ Adaptive aggregation for reinforcement learning in average reward Markov decision processes ⋮ Robustness of stochastic bandit policies ⋮ Adaptive-treed bandits ⋮ An artificial bee colony algorithm for the job shop scheduling problem with random processing times ⋮ Combining multiple strategies for multiarmed bandit problems and asymptotic optimality ⋮ On the probability of correct selection in ordinal comparison over dynamic networks ⋮ Bayesian policy reuse ⋮ Bandit-based Monte-Carlo structure learning of probabilistic logic programs ⋮ A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ An asymptotically optimal policy for finite support models in the multiarmed bandit problem ⋮ Truthful learning mechanisms for multi-slot sponsored search auctions with externalities ⋮ A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems ⋮ Temporal-difference search in Computer Go ⋮ Preference-based reinforcement learning: a formal framework and a policy iteration algorithm ⋮ The \(K\)-armed dueling bandits problem ⋮ A reliability-aware multi-armed bandit approach to learn and select users in demand response ⋮ Optimal Bayesian strategies for the infinite-armed Bernoulli bandit ⋮ An optimal bidimensional multi-armed bandit auction for multi-unit procurement ⋮ A dynamic programming strategy to balance exploration and exploitation in the bandit problem ⋮ Analyzing bandit-based adaptive operator selection mechanisms ⋮ Regret bounds for sleeping experts and bandits ⋮ A pricing problem with unknown arrival rate and price sensitivity ⋮ Modeling item-item similarities for personalized recommendations on Yahoo! front page ⋮ Information capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden information ⋮ Regret bounds for restless Markov bandits ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ A Monte Carlo tree search approach to finding efficient patrolling schemes on graphs ⋮ Markov decision processes with sequential sensor measurements ⋮ Corruption-tolerant bandit learning ⋮ Anytime discovery of a diverse set of patterns with Monte Carlo tree search ⋮ Ballooning multi-armed bandits ⋮ Dynamic pricing with finite price sets: a non-parametric approach ⋮ Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-art ⋮ Batch repair actions for automated troubleshooting ⋮ On Bayesian index policies for sequential resource allocation ⋮ Latest stored information based adaptive selection strategy for multiobjective evolutionary algorithm ⋮ A methodology for determining an effective subset of heuristics in selection hyper-heuristics ⋮ Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise ⋮ A hybrid breakout local search and reinforcement learning approach to the vertex separator problem ⋮ MSO: a framework for bound-constrained black-box global optimization algorithms ⋮ BoostingTree: parallel selection of weak learners in boosting, with application to ranking ⋮ Effective deadlock resolution with self-interested partially-rational agents ⋮ Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards ⋮ Sampled fictitious play for approximate dynamic programming ⋮ Efficient crowdsourcing of unknown experts using bounded multi-armed bandits ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Active learning in heteroscedastic noise ⋮ Boundary crossing probabilities for general exponential families ⋮ An online algorithm for the risk-aware restless bandit ⋮ Response adaptive designs that incorporate switching costs and constraints ⋮ A multi-objective Monte Carlo tree search for forest harvest scheduling ⋮ Rollout sampling approximate policy iteration ⋮ Adaptive policies for perimeter surveillance problems ⋮ A survey of network interdiction models and algorithms ⋮ Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games ⋮ A bad arm existence checking problem: how to utilize asymmetric problem structure? ⋮ Exploration-exploitation tradeoff using variance estimates in multi-armed bandits ⋮ Neural precedence recommender ⋮ Enhancing gene expression programming based on space partition and jump for symbolic regression ⋮ A revised approach for risk-averse multi-armed bandits under CVaR criterion ⋮ Multi-armed bandits with episode context ⋮ Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit ⋮ Nonparametric Bayesian multiarmed bandits for single-cell experiment design ⋮ Two-armed bandit problem and batch version of the mirror descent algorithm ⋮ Multi-objective simultaneous optimistic optimization ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ An asymptotically optimal strategy for constrained multi-armed bandit problems ⋮ sampling based automatic modulation classifier ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Fairness in learning-based sequential decision algorithms: a survey ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning ⋮ The pure exploration problem with general reward functions depending on full distributions ⋮ Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems ⋮ Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges ⋮ On two continuum armed bandit problems in high dimensions
This page was built for publication: Finite-time analysis of the multiarmed bandit problem