Pages that link to "Item:Q366993"

From MaRDI portal

← Rémi Munos (Q366993)

Jump to:navigation, search

What links here

⧼whatlinkshere-whatlinkshere-target⧽

Page:

⧼whatlinkshere-whatlinkshere-ns⧽

Namespace:

Invert selection

⧼whatlinkshere-whatlinkshere-filter⧽

Hide transclusions

Hide links

Hide redirects

The following pages link to Rémi Munos (Q366993):

Displaying 35 items.

Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model (Q399890) (← links)
Learning with stochastic inputs and adversarial outputs (Q439998) (← links)
Regret bounds for restless Markov bandits (Q465253) (← links)
Minimax number of strata for online stratified sampling: the case of noisy samples (Q465255) (← links)
Numerical methods for the pricing of swing options: a stochastic control approach (Q861551) (← links)
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path (Q1009248) (← links)
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665) (← links)
A study of reinfrocement learning in the continuous case by the means of viscosity solutions (Q1584837) (← links)
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values (Q1642208) (← links)
Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations (Q1773340) (← links)
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling (Q2051259) (← links)
Pure exploration in finitely-armed and continuous-armed bandits (Q2431430) (← links)
An anti-diffusive scheme for viability problems (Q2497775) (← links)
(Q2810787) (← links)
Q( $$\lambda $$ ) with Off-Policy Corrections (Q2831390) (← links)
(Q3093352) (← links)
(Q3093369) (← links)
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits (Q3093949) (← links)
(Q3096132) (← links)
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis (Q3164821) (← links)
Regret Bounds for Restless Markov Bandits (Q3164822) (← links)
Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples (Q3164823) (← links)
Tuning Bandit Algorithms in Stochastic Environments (Q3520056) (← links)
Pure Exploration in Multi-armed Bandits Problems (Q3648740) (← links)
(Q5149015) (← links)
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning (Q5168384) (← links)
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path (Q5307594) (← links)
Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control (Q5317090) (← links)
(Q5396654) (← links)
(Q5405205) (← links)
(Q5405216) (← links)
Performance Bounds in $L_p$‐norm for Approximate Value Iteration (Q5453575) (← links)
(Q5744838) (← links)
Editors’ Introduction (Q5891268) (← links)

Retrieved from "https://portal.mardi4nfdi.de/wiki/Special:WhatLinksHere/Item:Q366993"