scientific article; zbMATH DE number 7164724
From MaRDI portal
Publication:5214215
zbMath1434.68515arXiv1703.07608MaRDI QIDQ5214215
Ian Osband, Benjamin van Roy, Zheng Wen, Daniel J. Russo
Publication date: 7 February 2020
Full work available at URL: https://arxiv.org/abs/1703.07608
Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Artificial neural networks and deep learning (68T07) Bayesian inference (62F15) Markov and semi-Markov decision processes (90C40) Sequential statistical analysis (62L10)
Related Items (9)
Unnamed Item ⋮ Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning ⋮ Priors in Bayesian Deep Learning: A Review ⋮ Reinforcement Learning, Bit by Bit ⋮ Deep Reinforcement Learning: A State-of-the-Art Walkthrough ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Sophisticated Inference ⋮ Fundamental design principles for reinforcement learning algorithms
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Knows what it knows: a framework for self-aware learning
- Reducing reinforcement learning to KWIK online regression
- Bootstrapping data arrays of arbitrary order
- Some asymptotic theory for the bootstrap
- Increasing risk: Some direct constructions
- Near-optimal reinforcement learning in polynomial time
- Bootstrap prediction and Bayesian prediction under misspecified models
- Q( $$\lambda $$ ) with Off-Policy Corrections
- 10.1162/153244303765208377
- Algorithms for Reinforcement Learning
- An analysis of temporal-difference learning with function approximation
- A Tutorial on Thompson Sampling
- Near-Optimal Regret Bounds for Thompson Sampling
- Learning to Optimize via Information-Directed Sampling
- How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage
- Learning to Optimize via Posterior Sampling
- Discounted Dynamic Programming
- The Efficiency Analysis of Choices Involving Risk
This page was built for publication: