Tuning Bandit Algorithms in Stochastic Environments
From MaRDI portal
Publication:3520056
DOI10.1007/978-3-540-75225-7_15zbMath1142.68382OpenAlexW1583155004MaRDI QIDQ3520056
Csaba Szepesvári, Jean-Yves Audibert, Rémi Munos
Publication date: 19 August 2008
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://hal.inria.fr/inria-00203487/file/ucb_alt.pdf
Computational learning theory (68Q32) Stopping times; optimal stopping problems; gambling theory (60G40)
Related Items
AI-driven liquidity provision in OTC financial markets, Corruption-tolerant bandit learning, Reward-Modulated Hebbian Learning of Decision Making, Unnamed Item, Detecting concept change in dynamic data streams, Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
Cites Work