Rémi Munos

From MaRDI portal
Person:366993

Available identifiers

zbMath Open munos.remiMaRDI QIDQ366993

List of research outcomes





PublicationDate of PublicationType
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling2021-11-24Paper
https://portal.mardi4nfdi.de/entity/Q51490152021-02-05Paper
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values2018-06-20Paper
Q( $$\lambda $$ ) with Off-Policy Corrections2016-11-09Paper
Analysis of classification-based policy iteration algorithms2016-06-06Paper
https://portal.mardi4nfdi.de/entity/Q57448382016-02-19Paper
Regret bounds for restless Markov bandits2014-10-31Paper
Minimax number of strata for online stratified sampling: the case of noisy samples2014-10-31Paper
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model2014-08-20Paper
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning2014-07-04Paper
https://portal.mardi4nfdi.de/entity/Q54052052014-04-01Paper
https://portal.mardi4nfdi.de/entity/Q54052162014-04-01Paper
https://portal.mardi4nfdi.de/entity/Q53966542014-02-03Paper
Editors’ Introduction2013-11-06Paper
Kullback-Leibler upper confidence bounds for optimal sequential allocation2013-09-25Paper
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis2012-10-16Paper
Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples2012-10-16Paper
Regret Bounds for Restless Markov Bandits2012-10-16Paper
Learning with stochastic inputs and adversarial outputs2012-08-17Paper
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit2012-05-18Paper
https://portal.mardi4nfdi.de/entity/Q30961322011-11-08Paper
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits2011-10-19Paper
https://portal.mardi4nfdi.de/entity/Q30933692011-10-12Paper
https://portal.mardi4nfdi.de/entity/Q30933522011-10-12Paper
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences2011-05-29Paper
Pure exploration in finitely-armed and continuous-armed bandits2011-04-14Paper
Pure Exploration in Multi-armed Bandits Problems2009-12-01Paper
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits2009-05-12Paper
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path2009-03-31Paper
Tuning Bandit Algorithms in Stochastic Environments2008-08-19Paper
Performance Bounds in $L_p$‐norm for Approximate Value Iteration2008-04-03Paper
Pure Exploration for Multi-Armed Bandit Problems2008-02-19Paper
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path2007-09-14Paper
Numerical methods for the pricing of swing options: a stochastic control approach2007-01-29Paper
An anti-diffusive scheme for viability problems2006-08-04Paper
Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control2005-09-15Paper
Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations2005-04-28Paper
A study of reinfrocement learning in the continuous case by the means of viscosity solutions2000-11-05Paper

Research outcomes over time

This page was built for person: Rémi Munos