Bandit Algorithms
From MaRDI portal
Publication:5109247
DOI10.1017/9781108571401zbMath1439.68002OpenAlexW4206530644MaRDI QIDQ5109247
Tor Lattimore, Csaba Szepesvári
Publication date: 11 May 2020
Full work available at URL: https://doi.org/10.1017/9781108571401
Learning and adaptive systems in artificial intelligence (68T05) Pattern recognition, speech recognition (68T10) Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Optimal stopping in statistics (62L15) Probabilistic games; gambling (91A60)
Related Items (66)
Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Functional Sequential Treatment Allocation ⋮ Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits ⋮ Bayesian Exploration: Incentivizing Exploration in Bayesian Games ⋮ Bayesian Brains and the Rényi Divergence ⋮ A Single-Index Model With a Surface-Link for Optimizing Individualized Dose Rules ⋮ Multiplayer Bandits Without Observing Collision Information ⋮ Fictitious Play in Zero-Sum Stochastic Games ⋮ Always Valid Inference: Continuous Monitoring of A/B Tests ⋮ Daisee: Adaptive importance sampling by balancing exploration and exploitation ⋮ Online learning for scheduling MIP heuristics ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Budget-limited distribution learning in multifidelity problems ⋮ Risk filtering and risk-averse control of Markovian systems subject to model uncertainty ⋮ Multi-armed bandits with censored consumption of resources ⋮ Dealing with expert bias in collective decision-making ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ A Theory of Bounded Inductive Rationality ⋮ A unified stochastic approximation framework for learning in games ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A probabilistic reduced basis method for parameter-dependent problems ⋮ Bayesian adaptive randomization with compound utility functions ⋮ Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Constrained regret minimization for multi-criterion multi-armed bandits ⋮ AI-driven liquidity provision in OTC financial markets ⋮ Asymptotic optimality for decentralised bandits ⋮ Temporal logic explanations for dynamic decision systems using anchors and Monte Carlo tree search ⋮ Safe multi-agent reinforcement learning for multi-robot control ⋮ Treatment recommendation with distributional targets ⋮ Exponential asymptotic optimality of Whittle index policy ⋮ Response-adaptive randomization in clinical trials: from myths to practical considerations ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ Efficient and generalizable tuning strategies for stochastic gradient MCMC ⋮ Robust and efficient algorithms for conversational contextual bandit ⋮ Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization ⋮ Customization of J. Bather's UCB strategy for a Gaussian multiarmed bandit ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Deciding when to quit the gambler's ruin game with unknown probabilities ⋮ Ballooning multi-armed bandits ⋮ Locks, Bombs and Testing: The Case of Independent Locks ⋮ Unnamed Item ⋮ On concentration inequalities for vector-valued Lipschitz functions ⋮ Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards ⋮ Regularizing Double Machine Learning in Partially Linear Endogenous Models ⋮ Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives ⋮ Uncertainty calibration for probabilistic projection methods ⋮ Multi-armed bandit with sub-exponential rewards ⋮ Matching While Learning ⋮ Two-armed bandit problem and batch version of the mirror descent algorithm ⋮ A PAC algorithm in relative precision for bandit problem with costly sampling ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A Markov decision process for response-adaptive randomization in clinical trials ⋮ Learning in Repeated Auctions ⋮ On the Bias, Risk, and Consistency of Sample Means in Multi-armed Bandits ⋮ Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability ⋮ A Bandit-Learning Approach to Multifidelity Approximation ⋮ Whittle index based Q-learning for restless bandits with average reward
This page was built for publication: Bandit Algorithms