Rémi Munos

From MaRDI portal
Person:366993

Available identifiers

zbMath Open munos.remiMaRDI QIDQ366993

List of research outcomes

PublicationDate of PublicationType
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling2021-11-24Paper
https://portal.mardi4nfdi.de/entity/Q51490152021-02-05Paper
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values2018-06-20Paper
Q( $$\lambda $$ ) with Off-Policy Corrections2016-11-09Paper
https://portal.mardi4nfdi.de/entity/Q28107872016-06-06Paper
https://portal.mardi4nfdi.de/entity/Q57448382016-02-19Paper
Regret bounds for restless Markov bandits2014-10-31Paper
Minimax number of strata for online stratified sampling: the case of noisy samples2014-10-31Paper
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model2014-08-20Paper
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning2014-07-04Paper
https://portal.mardi4nfdi.de/entity/Q54052052014-04-01Paper
https://portal.mardi4nfdi.de/entity/Q54052162014-04-01Paper
https://portal.mardi4nfdi.de/entity/Q53966542014-02-03Paper
Editors’ Introduction2013-11-06Paper
Kullback-Leibler upper confidence bounds for optimal sequential allocation2013-09-25Paper
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis2012-10-16Paper
Regret Bounds for Restless Markov Bandits2012-10-16Paper
Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples2012-10-16Paper
Learning with stochastic inputs and adversarial outputs2012-08-17Paper
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit2012-05-18Paper
https://portal.mardi4nfdi.de/entity/Q30961322011-11-08Paper
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits2011-10-19Paper
https://portal.mardi4nfdi.de/entity/Q30933522011-10-12Paper
https://portal.mardi4nfdi.de/entity/Q30933692011-10-12Paper
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences2011-05-29Paper
Pure exploration in finitely-armed and continuous-armed bandits2011-04-14Paper
Pure Exploration in Multi-armed Bandits Problems2009-12-01Paper
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits2009-05-12Paper
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path2009-03-31Paper
Tuning Bandit Algorithms in Stochastic Environments2008-08-19Paper
Performance Bounds in $L_p$‐norm for Approximate Value Iteration2008-04-03Paper
Pure Exploration for Multi-Armed Bandit Problems2008-02-19Paper
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path2007-09-14Paper
Numerical methods for the pricing of swing options: a stochastic control approach2007-01-29Paper
An anti-diffusive scheme for viability problems2006-08-04Paper
Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control2005-09-15Paper
Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations2005-04-28Paper
A study of reinfrocement learning in the continuous case by the means of viscosity solutions2000-11-05Paper

Research outcomes over time


Doctoral students

No records found.


Known relations from the MaRDI Knowledge Graph

PropertyValue
MaRDI profile typeMaRDI person profile
instance ofhuman


This page was built for person: Rémi Munos