Publication:5396640
From MaRDI portal
zbMath1280.91039MaRDI QIDQ5396640
Publication date: 3 February 2014
Full work available at URL: http://www.jmlr.org/papers/v12/hazan11a.html
91B06: Decision theory
68T05: Learning and adaptive systems in artificial intelligence
90C40: Markov and semi-Markov decision processes
91A60: Probabilistic games; gambling
Related Items
Unnamed Item, Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards, AN ONLINE PORTFOLIO SELECTION ALGORITHM WITH REGRET LOGARITHMIC IN PRICE VARIATION, Doubly robust policy evaluation and optimization, Extracting certainty from uncertainty: regret bounded by variation in costs, Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm, Truthful Mechanisms with Implicit Payment Computation