Bandit Algorithms

From MaRDI portal

Revision as of 14:30, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5109247

Jump to:navigation, search

DOI10.1017/9781108571401zbMath1439.68002OpenAlexW4206530644MaRDI QIDQ5109247

Tor Lattimore, Csaba Szepesvári

Publication date: 11 May 2020

Full work available at URL: https://doi.org/10.1017/9781108571401

zbMATH Keywords

pattern recognition control systems engineering optimisation computer science machine learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Pattern recognition, speech recognition (68T10) Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Optimal stopping in statistics (62L15) Probabilistic games; gambling (91A60)

Related Items (88)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Functional Sequential Treatment Allocation ⋮ Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits ⋮ Bayesian Exploration: Incentivizing Exploration in Bayesian Games ⋮ Bayesian Brains and the Rényi Divergence ⋮ A Single-Index Model With a Surface-Link for Optimizing Individualized Dose Rules ⋮ Multiplayer Bandits Without Observing Collision Information ⋮ Fictitious Play in Zero-Sum Stochastic Games ⋮ Always Valid Inference: Continuous Monitoring of A/B Tests ⋮ Daisee: Adaptive importance sampling by balancing exploration and exploitation ⋮ Online learning for scheduling MIP heuristics ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Budget-limited distribution learning in multifidelity problems ⋮ Risk filtering and risk-averse control of Markovian systems subject to model uncertainty ⋮ Multi-armed bandits with censored consumption of resources ⋮ Dealing with expert bias in collective decision-making ⋮ Nonparametric learning for impulse control problems -- exploration vs. exploitation ⋮ A Theory of Bounded Inductive Rationality ⋮ A unified stochastic approximation framework for learning in games ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A probabilistic reduced basis method for parameter-dependent problems ⋮ Bayesian adaptive randomization with compound utility functions ⋮ Learning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent Chains ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Constrained regret minimization for multi-criterion multi-armed bandits ⋮ AI-driven liquidity provision in OTC financial markets ⋮ Asymptotic optimality for decentralised bandits ⋮ Temporal logic explanations for dynamic decision systems using anchors and Monte Carlo tree search ⋮ Safe multi-agent reinforcement learning for multi-robot control ⋮ Treatment recommendation with distributional targets ⋮ Exponential asymptotic optimality of Whittle index policy ⋮ Response-adaptive randomization in clinical trials: from myths to practical considerations ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ Efficient and generalizable tuning strategies for stochastic gradient MCMC ⋮ Robust and efficient algorithms for conversational contextual bandit ⋮ Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization ⋮ Customization of J. Bather's UCB strategy for a Gaussian multiarmed bandit ⋮ Settling the sample complexity of model-based offline reinforcement learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Deciding when to quit the gambler's ruin game with unknown probabilities ⋮ Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning ⋮ Ballooning multi-armed bandits ⋮ Locks, Bombs and Testing: The Case of Independent Locks ⋮ An exact bandit model for the risk-volatility tradeoff ⋮ An approximate control variates approach to multifidelity distribution estimation ⋮ Unnamed Item ⋮ On concentration inequalities for vector-valued Lipschitz functions ⋮ Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards ⋮ Exploiting action impact regularity and exogenous state variables for offline reinforcement learning ⋮ Optimistic MLE: a generic model-based algorithm for partially observable sequential decision making ⋮ Regularizing Double Machine Learning in Partially Linear Endogenous Models ⋮ Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization ⋮ Thompson sampling for networked control over unknown channels ⋮ Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates ⋮ Invariant description of control in a Gaussian one-armed bandit problem ⋮ Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty ⋮ Optimal analysis for bandit learning in matching markets with serial dictatorship ⋮ A modified EXP3 in adversarial bandits with multi-user delayed feedback ⋮ Optimization of two-alternative batch processing with parameter estimation based on data inside batches ⋮ Tracking the mean of a piecewise stationary sequence ⋮ Online learning in budget-constrained dynamic Colonel Blotto games ⋮ Surveillance for endemic infectious disease outbreaks: adaptive sampling using profile likelihood estimation ⋮ Certified multifidelity zeroth-order optimization ⋮ Deep spatial Q-learning for infectious disease control ⋮ Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration ⋮ Risk preferences of learning algorithms ⋮ Integrating multi-armed bandit with local search for MaxSAT ⋮ Online learning in sequential Bayesian persuasion: handling unknown priors ⋮ Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives ⋮ Uncertainty calibration for probabilistic projection methods ⋮ Multi-armed bandit with sub-exponential rewards ⋮ Matching While Learning ⋮ Two-armed bandit problem and batch version of the mirror descent algorithm ⋮ A PAC algorithm in relative precision for bandit problem with costly sampling ⋮ Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm ⋮ Fundamental design principles for reinforcement learning algorithms ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ A Markov decision process for response-adaptive randomization in clinical trials ⋮ Learning in Repeated Auctions ⋮ On the Bias, Risk, and Consistency of Sample Means in Multi-armed Bandits ⋮ Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability ⋮ A Bandit-Learning Approach to Multifidelity Approximation ⋮ Whittle index based Q-learning for restless bandits with average reward

This page was built for publication: Bandit Algorithms

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5109247&oldid=19631492"