Bandit Algorithms

From MaRDI portal

Publication:5109247

Jump to:navigation, search

DOI10.1017/9781108571401zbMath1439.68002OpenAlexW4206530644MaRDI QIDQ5109247

Tor Lattimore, Csaba Szepesvári

Publication date: 11 May 2020

Full work available at URL: https://doi.org/10.1017/9781108571401

zbMATH Keywords

pattern recognition control systems engineering optimisation computer science machine learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Pattern recognition, speech recognition (68T10) Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Optimal stopping in statistics (62L15) Probabilistic games; gambling (91A60)

Related Items (66)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Functional Sequential Treatment Allocation ⋮ Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits ⋮ Bayesian Exploration: Incentivizing Exploration in Bayesian Games ⋮ Bayesian Brains and the Rényi Divergence ⋮ A Single-Index Model With a Surface-Link for Optimizing Individualized Dose Rules ⋮ Multiplayer Bandits Without Observing Collision Information ⋮ Fictitious Play in Zero-Sum Stochastic Games ⋮ Always Valid Inference: Continuous Monitoring of A/B Tests ⋮ Daisee: Adaptive importance sampling by balancing exploration and exploitation ⋮ Online learning for scheduling MIP heuristics ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Budget-limited distribution learning in multifidelity problems ⋮ Risk filtering and risk-averse control of Markovian systems subject to model uncertainty ⋮ Multi-armed bandits with censored consumption of resources ⋮ Dealing with expert bias in collective decision-making ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ A Theory of Bounded Inductive Rationality ⋮ A unified stochastic approximation framework for learning in games ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A probabilistic reduced basis method for parameter-dependent problems ⋮ Bayesian adaptive randomization with compound utility functions ⋮ Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Constrained regret minimization for multi-criterion multi-armed bandits ⋮ AI-driven liquidity provision in OTC financial markets ⋮ Asymptotic optimality for decentralised bandits ⋮ Temporal logic explanations for dynamic decision systems using anchors and Monte Carlo tree search ⋮ Safe multi-agent reinforcement learning for multi-robot control ⋮ Treatment recommendation with distributional targets ⋮ Exponential asymptotic optimality of Whittle index policy ⋮ Response-adaptive randomization in clinical trials: from myths to practical considerations ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ Efficient and generalizable tuning strategies for stochastic gradient MCMC ⋮ Robust and efficient algorithms for conversational contextual bandit ⋮ Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization ⋮ Customization of J. Bather's UCB strategy for a Gaussian multiarmed bandit ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Deciding when to quit the gambler's ruin game with unknown probabilities ⋮ Ballooning multi-armed bandits ⋮ Locks, Bombs and Testing: The Case of Independent Locks ⋮ Unnamed Item ⋮ On concentration inequalities for vector-valued Lipschitz functions ⋮ Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards ⋮ Regularizing Double Machine Learning in Partially Linear Endogenous Models ⋮ Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives ⋮ Uncertainty calibration for probabilistic projection methods ⋮ Multi-armed bandit with sub-exponential rewards ⋮ Matching While Learning ⋮ Two-armed bandit problem and batch version of the mirror descent algorithm ⋮ A PAC algorithm in relative precision for bandit problem with costly sampling ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A Markov decision process for response-adaptive randomization in clinical trials ⋮ Learning in Repeated Auctions ⋮ On the Bias, Risk, and Consistency of Sample Means in Multi-armed Bandits ⋮ Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability ⋮ A Bandit-Learning Approach to Multifidelity Approximation ⋮ Whittle index based Q-learning for restless bandits with average reward

This page was built for publication: Bandit Algorithms

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5109247&oldid=19631492"