Bayesian policy reuse
DOI10.1007/S10994-016-5547-YzbMATH Open1454.68129arXiv1505.00284OpenAlexW778742492MaRDI QIDQ1689554FDOQ1689554
Majd Hawasly, Benjamin Rosman, Subramanian Ramamoorthy
Publication date: 12 January 2018
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1505.00284
transfer learningBayesian decision theoryonline learningreinforcement learningBayesian optimisationonline banditspolicy reuse
Learning and adaptive systems in artificial intelligence (68T05) Online algorithms; streaming algorithms (68W27) Bayesian problems; characterization of Bayes procedures (62C10)
Cites Work
- Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Asymptotically efficient adaptive allocation rules
- Computing a classic index for finite-horizon bandits
- Finite-time analysis of the multiarmed bandit problem
- A Structured Multiarmed Bandit Problem and the Greedy Policy
- Transfer learning for reinforcement learning domains: a survey
- Title not available (Why is that?)
- Bayesian policy gradient and actor-critic algorithms
- Title not available (Why is that?)
Cited In (1)
This page was built for publication: Bayesian policy reuse
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1689554)