Infomax strategies for an optimal balance between exploration and exploitation
DOI10.1007/S10955-016-1521-0zbMATH Open1414.91116arXiv1601.03073OpenAlexW2233334990MaRDI QIDQ310029FDOQ310029
Authors: Gautam Reddy, A. Celani, Massimo Vergassola
Publication date: 7 September 2016
Published in: Journal of Statistical Physics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1601.03073
Recommendations
- A dynamic programming strategy to balance exploration and exploitation in the bandit problem
- scientific article; zbMATH DE number 1907146
- Learning to optimize via information-directed sampling
- Optimal selection with alternative information
- Reinforcement learning: exploration-exploitation dilemma in multi-agent foraging task
large deviationsdecision and information theoryexploration and exploitationinfomaxmulti-armed bandits
Large deviations (60F10) Management decision making, including multiple objectives (90B50) Decision theory (91B06)
Cites Work
- Title not available (Why is that?)
- Elements of Information Theory
- A Mathematical Theory of Communication
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Asymptotically efficient adaptive allocation rules
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Finite-time analysis of the multiarmed bandit problem
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Rényi Divergence and Kullback-Leibler Divergence
- A bound on the financial value of information
- The value of information for populations in varying environments
- Adaptive treatment allocation and the multi-armed bandit problem
- Thompson sampling: an asymptotically optimal finite-time analysis
- Title not available (Why is that?)
- Information, Physics, and Computation
- Optimal stopping and dynamic allocation
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Optimal Adaptive Policies for Markov Decision Processes
Cited In (2)
Uses Software
This page was built for publication: Infomax strategies for an optimal balance between exploration and exploitation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q310029)