The following pages link to Rémi Munos (Q366993):
Displaying 35 items.
- Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
- Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model (Q399890) (← links)
- Learning with stochastic inputs and adversarial outputs (Q439998) (← links)
- Regret bounds for restless Markov bandits (Q465253) (← links)
- Minimax number of strata for online stratified sampling: the case of noisy samples (Q465255) (← links)
- Numerical methods for the pricing of swing options: a stochastic control approach (Q861551) (← links)
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (Q1009248) (← links)
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665) (← links)
- A study of reinfrocement learning in the continuous case by the means of viscosity solutions (Q1584837) (← links)
- Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values (Q1642208) (← links)
- Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations (Q1773340) (← links)
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
- Pure exploration in finitely-armed and continuous-armed bandits (Q2431430) (← links)
- An anti-diffusive scheme for viability problems (Q2497775) (← links)
- (Q2810787) (← links)
- Q( $$\lambda $$ ) with Off-Policy Corrections (Q2831390) (← links)
- (Q3093352) (← links)
- (Q3093369) (← links)
- Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits (Q3093949) (← links)
- (Q3096132) (← links)
- Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis (Q3164821) (← links)
- Regret Bounds for Restless Markov Bandits (Q3164822) (← links)
- Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples (Q3164823) (← links)
- Tuning Bandit Algorithms in Stochastic Environments (Q3520056) (← links)
- Pure Exploration in Multi-armed Bandits Problems (Q3648740) (← links)
- (Q5149015) (← links)
- From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning (Q5168384) (← links)
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path (Q5307594) (← links)
- Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control (Q5317090) (← links)
- (Q5396654) (← links)
- (Q5405205) (← links)
- (Q5405216) (← links)
- Performance Bounds in $L_p$‐norm for Approximate Value Iteration (Q5453575) (← links)
- (Q5744838) (← links)
- Editors’ Introduction (Q5891268) (← links)