Bandit Algorithms

From MaRDI portal
Publication:5109247

DOI10.1017/9781108571401zbMath1439.68002OpenAlexW4206530644MaRDI QIDQ5109247

Tor Lattimore, Csaba Szepesvári

Publication date: 11 May 2020

Full work available at URL: https://doi.org/10.1017/9781108571401




Related Items (66)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliersFunctional Sequential Treatment AllocationGreedy Algorithm Almost Dominates in Smoothed Contextual BanditsBayesian Exploration: Incentivizing Exploration in Bayesian GamesBayesian Brains and the Rényi DivergenceA Single-Index Model With a Surface-Link for Optimizing Individualized Dose RulesMultiplayer Bandits Without Observing Collision InformationFictitious Play in Zero-Sum Stochastic GamesAlways Valid Inference: Continuous Monitoring of A/B TestsDaisee: Adaptive importance sampling by balancing exploration and exploitationOnline learning for scheduling MIP heuristicsMulti-armed bandit-based hyper-heuristics for combinatorial optimization problemsBudget-limited distribution learning in multifidelity problemsRisk filtering and risk-averse control of Markovian systems subject to model uncertaintyMulti-armed bandits with censored consumption of resourcesDealing with expert bias in collective decision-makingNonparametric learning for impulse control problems -- exploration vs. exploitationA Theory of Bounded Inductive RationalityA unified stochastic approximation framework for learning in gamesUnnamed ItemUnnamed ItemUnnamed ItemA probabilistic reduced basis method for parameter-dependent problemsBayesian adaptive randomization with compound utility functionsLearning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent ChainsNearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset SelectionConstrained regret minimization for multi-criterion multi-armed banditsAI-driven liquidity provision in OTC financial marketsAsymptotic optimality for decentralised banditsTemporal logic explanations for dynamic decision systems using anchors and Monte Carlo tree searchSafe multi-agent reinforcement learning for multi-robot controlTreatment recommendation with distributional targetsExponential asymptotic optimality of Whittle index policyResponse-adaptive randomization in clinical trials: from myths to practical considerationsEmpirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problemsEfficient and generalizable tuning strategies for stochastic gradient MCMCRobust and efficient algorithms for conversational contextual banditRelaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularizationCustomization of J. Bather's UCB strategy for a Gaussian multiarmed banditSettling the sample complexity of model-based offline reinforcement learningUnnamed ItemUnnamed ItemDeciding when to quit the gambler's ruin game with unknown probabilitiesBallooning multi-armed banditsLocks, Bombs and Testing: The Case of Independent LocksUnnamed ItemOn concentration inequalities for vector-valued Lipschitz functionsRandomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewardsRegularizing Double Machine Learning in Partially Linear Endogenous ModelsMulti-objective multi-armed bandit with lexicographically ordered and satisficing objectivesUncertainty calibration for probabilistic projection methodsMulti-armed bandit with sub-exponential rewardsMatching While LearningTwo-armed bandit problem and batch version of the mirror descent algorithmA PAC algorithm in relative precision for bandit problem with costly samplingStochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithmFundamental design principles for reinforcement learning algorithmsUnnamed ItemUnnamed ItemUnnamed ItemA Markov decision process for response-adaptive randomization in clinical trialsLearning in Repeated AuctionsOn the Bias, Risk, and Consistency of Sample Means in Multi-armed BanditsBypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under RealizabilityA Bandit-Learning Approach to Multifidelity ApproximationWhittle index based Q-learning for restless bandits with average reward




This page was built for publication: Bandit Algorithms