Bandit Algorithms

From MaRDI portal
Revision as of 14:30, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5109247

DOI10.1017/9781108571401zbMath1439.68002OpenAlexW4206530644MaRDI QIDQ5109247

Tor Lattimore, Csaba Szepesvári

Publication date: 11 May 2020

Full work available at URL: https://doi.org/10.1017/9781108571401






Related Items (88)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliersFunctional Sequential Treatment AllocationGreedy Algorithm Almost Dominates in Smoothed Contextual BanditsBayesian Exploration: Incentivizing Exploration in Bayesian GamesBayesian Brains and the Rényi DivergenceA Single-Index Model With a Surface-Link for Optimizing Individualized Dose RulesMultiplayer Bandits Without Observing Collision InformationFictitious Play in Zero-Sum Stochastic GamesAlways Valid Inference: Continuous Monitoring of A/B TestsDaisee: Adaptive importance sampling by balancing exploration and exploitationOnline learning for scheduling MIP heuristicsMulti-armed bandit-based hyper-heuristics for combinatorial optimization problemsBudget-limited distribution learning in multifidelity problemsRisk filtering and risk-averse control of Markovian systems subject to model uncertaintyMulti-armed bandits with censored consumption of resourcesDealing with expert bias in collective decision-makingNonparametric learning for impulse control problems -- exploration vs. exploitationA Theory of Bounded Inductive RationalityA unified stochastic approximation framework for learning in gamesUnnamed ItemUnnamed ItemUnnamed ItemA probabilistic reduced basis method for parameter-dependent problemsBayesian adaptive randomization with compound utility functionsLearning Stationary Nash Equilibrium Policies in \(n\)-Player Stochastic Games with Independent ChainsNearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset SelectionConstrained regret minimization for multi-criterion multi-armed banditsAI-driven liquidity provision in OTC financial marketsAsymptotic optimality for decentralised banditsTemporal logic explanations for dynamic decision systems using anchors and Monte Carlo tree searchSafe multi-agent reinforcement learning for multi-robot controlTreatment recommendation with distributional targetsExponential asymptotic optimality of Whittle index policyResponse-adaptive randomization in clinical trials: from myths to practical considerationsEmpirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problemsEfficient and generalizable tuning strategies for stochastic gradient MCMCRobust and efficient algorithms for conversational contextual banditRelaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularizationCustomization of J. Bather's UCB strategy for a Gaussian multiarmed banditSettling the sample complexity of model-based offline reinforcement learningUnnamed ItemUnnamed ItemDeciding when to quit the gambler's ruin game with unknown probabilitiesDoubly Robust Interval Estimation for Optimal Policy Evaluation in Online LearningBallooning multi-armed banditsLocks, Bombs and Testing: The Case of Independent LocksAn exact bandit model for the risk-volatility tradeoffAn approximate control variates approach to multifidelity distribution estimationUnnamed ItemOn concentration inequalities for vector-valued Lipschitz functionsRandomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewardsExploiting action impact regularity and exogenous state variables for offline reinforcement learningOptimistic MLE: a generic model-based algorithm for partially observable sequential decision makingRegularizing Double Machine Learning in Partially Linear Endogenous ModelsFinding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimizationThompson sampling for networked control over unknown channelsAdaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional CovariatesInvariant description of control in a Gaussian one-armed bandit problemMultinomial Thompson sampling for rating scales and prior considerations for calibrating uncertaintyOptimal analysis for bandit learning in matching markets with serial dictatorshipA modified EXP3 in adversarial bandits with multi-user delayed feedbackOptimization of two-alternative batch processing with parameter estimation based on data inside batchesTracking the mean of a piecewise stationary sequenceOnline learning in budget-constrained dynamic Colonel Blotto gamesSurveillance for endemic infectious disease outbreaks: adaptive sampling using profile likelihood estimationCertified multifidelity zeroth-order optimizationDeep spatial Q-learning for infectious disease controlThompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-explorationRisk preferences of learning algorithmsIntegrating multi-armed bandit with local search for MaxSATOnline learning in sequential Bayesian persuasion: handling unknown priorsMulti-objective multi-armed bandit with lexicographically ordered and satisficing objectivesUncertainty calibration for probabilistic projection methodsMulti-armed bandit with sub-exponential rewardsMatching While LearningTwo-armed bandit problem and batch version of the mirror descent algorithmA PAC algorithm in relative precision for bandit problem with costly samplingStochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithmFundamental design principles for reinforcement learning algorithmsUnnamed ItemUnnamed ItemUnnamed ItemA Markov decision process for response-adaptive randomization in clinical trialsLearning in Repeated AuctionsOn the Bias, Risk, and Consistency of Sample Means in Multi-armed BanditsBypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under RealizabilityA Bandit-Learning Approach to Multifidelity ApproximationWhittle index based Q-learning for restless bandits with average reward







This page was built for publication: Bandit Algorithms