Finite-time analysis of the multiarmed bandit problem

From MaRDI portal

Publication:5959973

Jump to:navigation, search

DOI10.1023/A:1013689704352zbMath1012.68093WikidataQ56675670 ScholiaQ56675670MaRDI QIDQ5959973

Nicolò Cesa-Bianchi, Peter Auer, Paul Fischer

Publication date: 11 April 2002

Published in: Machine Learning (Search for Journal in Brave)

zbMATH Keywords

reinforcement learning

Mathematics Subject Classification ID

Computational learning theory (68Q32) Learning and adaptive systems in artificial intelligence (68T05)

Related Items (showing only first 100 - show all)

General game playing with stochastic CSP ⋮ A non-parametric solution to the multi-armed bandit problem with covariates ⋮ Batched bandit problems ⋮ Online machine learning algorithms to optimize performances of complex wireless communication systems ⋮ Algorithms for computing strategies in two-player simultaneous move games ⋮ Optimal control with learning on the fly: a toy problem ⋮ An analysis for strength improvement of an MCTS-based program playing Chinese dark chess ⋮ Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search ⋮ LinUCB applied to Monte Carlo tree search ⋮ Infomax strategies for an optimal balance between exploration and exploitation ⋮ Crowdsourcing with unsure option ⋮ Using reinforcement learning to find an optimal set of features ⋮ Adaptive large neighborhood search for mixed integer programming ⋮ Response-adaptive designs for clinical trials: simultaneous learning from multiple patients ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ Control problems in online advertising and benefits of randomized bidding strategies ⋮ The multi-armed bandit problem with covariates ⋮ Wisdom of crowds versus groupthink: learning in groups and in isolation ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Exploration and exploitation of scratch games ⋮ Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search ⋮ Learning to steer nonlinear interior-point methods ⋮ The multi-armed bandit problem: an efficient nonparametric solution ⋮ Adaptive aggregation for reinforcement learning in average reward Markov decision processes ⋮ Robustness of stochastic bandit policies ⋮ Adaptive-treed bandits ⋮ An artificial bee colony algorithm for the job shop scheduling problem with random processing times ⋮ Combining multiple strategies for multiarmed bandit problems and asymptotic optimality ⋮ On the probability of correct selection in ordinal comparison over dynamic networks ⋮ Bayesian policy reuse ⋮ Bandit-based Monte-Carlo structure learning of probabilistic logic programs ⋮ A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ An asymptotically optimal policy for finite support models in the multiarmed bandit problem ⋮ Truthful learning mechanisms for multi-slot sponsored search auctions with externalities ⋮ A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems ⋮ Temporal-difference search in Computer Go ⋮ Preference-based reinforcement learning: a formal framework and a policy iteration algorithm ⋮ The \(K\)-armed dueling bandits problem ⋮ A reliability-aware multi-armed bandit approach to learn and select users in demand response ⋮ Optimal Bayesian strategies for the infinite-armed Bernoulli bandit ⋮ An optimal bidimensional multi-armed bandit auction for multi-unit procurement ⋮ A dynamic programming strategy to balance exploration and exploitation in the bandit problem ⋮ Analyzing bandit-based adaptive operator selection mechanisms ⋮ Regret bounds for sleeping experts and bandits ⋮ A pricing problem with unknown arrival rate and price sensitivity ⋮ Modeling item-item similarities for personalized recommendations on Yahoo! front page ⋮ Information capture and reuse strategies in Monte Carlo Tree Search, with applications to games of hidden information ⋮ Regret bounds for restless Markov bandits ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ A Monte Carlo tree search approach to finding efficient patrolling schemes on graphs ⋮ Markov decision processes with sequential sensor measurements ⋮ Corruption-tolerant bandit learning ⋮ Anytime discovery of a diverse set of patterns with Monte Carlo tree search ⋮ Ballooning multi-armed bandits ⋮ Dynamic pricing with finite price sets: a non-parametric approach ⋮ Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: a state-of-the-art ⋮ Batch repair actions for automated troubleshooting ⋮ On Bayesian index policies for sequential resource allocation ⋮ Latest stored information based adaptive selection strategy for multiobjective evolutionary algorithm ⋮ A methodology for determining an effective subset of heuristics in selection hyper-heuristics ⋮ Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise ⋮ A hybrid breakout local search and reinforcement learning approach to the vertex separator problem ⋮ MSO: a framework for bound-constrained black-box global optimization algorithms ⋮ BoostingTree: parallel selection of weak learners in boosting, with application to ranking ⋮ Effective deadlock resolution with self-interested partially-rational agents ⋮ Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards ⋮ Sampled fictitious play for approximate dynamic programming ⋮ Efficient crowdsourcing of unknown experts using bounded multi-armed bandits ⋮ Online regret bounds for Markov decision processes with deterministic transitions ⋮ Active learning in heteroscedastic noise ⋮ Boundary crossing probabilities for general exponential families ⋮ An online algorithm for the risk-aware restless bandit ⋮ Response adaptive designs that incorporate switching costs and constraints ⋮ A multi-objective Monte Carlo tree search for forest harvest scheduling ⋮ Rollout sampling approximate policy iteration ⋮ Adaptive policies for perimeter surveillance problems ⋮ A survey of network interdiction models and algorithms ⋮ Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games ⋮ A bad arm existence checking problem: how to utilize asymmetric problem structure? ⋮ Exploration-exploitation tradeoff using variance estimates in multi-armed bandits ⋮ Neural precedence recommender ⋮ Enhancing gene expression programming based on space partition and jump for symbolic regression ⋮ A revised approach for risk-averse multi-armed bandits under CVaR criterion ⋮ Multi-armed bandits with episode context ⋮ Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit ⋮ Nonparametric Bayesian multiarmed bandits for single-cell experiment design ⋮ Two-armed bandit problem and batch version of the mirror descent algorithm ⋮ Multi-objective simultaneous optimistic optimization ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ An asymptotically optimal strategy for constrained multi-armed bandit problems ⋮ sampling based automatic modulation classifier ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Fairness in learning-based sequential decision algorithms: a survey ⋮ Trading utility and uncertainty: applying the value of information to resolve the exploration-exploitation dilemma in reinforcement learning ⋮ The pure exploration problem with general reward functions depending on full distributions ⋮ Exploring search space trees using an adapted version of Monte Carlo tree search for combinatorial optimization problems ⋮ Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges ⋮ On two continuum armed bandit problems in high dimensions

This page was built for publication: Finite-time analysis of the multiarmed bandit problem

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5959973&oldid=12129738"