Rémi Munos

From MaRDI portal
(Redirected from Person:366993)



List of research outcomes

This list is not complete and representing at the moment only items from zbMATH Open and arXiv. We are working on additional sources - please check back here soon!

PublicationDate of PublicationType
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
Machine Learning
2021-11-24Paper
scientific article; zbMATH DE number 7306905 (Why is no real title available?)2021-02-05Paper
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values
Automatica
2018-06-20Paper
\(\text{Q}(\lambda)\) with off-policy corrections
Lecture Notes in Computer Science
2016-11-09Paper
Analysis of classification-based policy iteration algorithms
Journal of Machine Learning Research (JMLR)
2016-06-06Paper
Adaptive strategy for stratified Monte Carlo sampling2016-02-19Paper
Regret bounds for restless Markov bandits
Theoretical Computer Science
2014-10-31Paper
Minimax number of strata for online stratified sampling: the case of noisy samples
Theoretical Computer Science
2014-10-31Paper
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
Machine Learning
2014-08-20Paper
From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning
Foundations and Trends® in Machine Learning
2014-07-04Paper
scientific article; zbMATH DE number 6276198 (Why is no real title available?)2014-04-01Paper
Finite-sample analysis of least-squares policy iteration2014-04-01Paper
\(X\)-armed bandits2014-02-03Paper
Editors' introduction
Lecture Notes in Computer Science
2013-11-06Paper
Kullback-Leibler upper confidence bounds for optimal sequential allocation
The Annals of Statistics
2013-09-25Paper
Kullback-Leibler upper confidence bounds for optimal sequential allocation
The Annals of Statistics
2013-09-25Paper
Thompson sampling: an asymptotically optimal finite-time analysis
Lecture Notes in Computer Science
2012-10-16Paper
Minimax number of strata for online stratified sampling given noisy samples
Lecture Notes in Computer Science
2012-10-16Paper
Regret Bounds for Restless Markov Bandits
Lecture Notes in Computer Science
2012-10-16Paper
Learning with stochastic inputs and adversarial outputs
Journal of Computer and System Sciences
2012-08-17Paper
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit2012-05-18Paper
Finite-time bounds for fitted value iteration2011-11-08Paper
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Lecture Notes in Computer Science
2011-10-19Paper
Policy gradient in continuous time2011-10-12Paper
Geometric variance reduction in Markov chains: application to value function and gradient estimation2011-10-12Paper
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences2011-05-29Paper
Pure exploration in finitely-armed and continuous-armed bandits
Theoretical Computer Science
2011-04-14Paper
Pure exploration in multi-armed bandits problems
Lecture Notes in Computer Science
2009-12-01Paper
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Theoretical Computer Science
2009-05-12Paper
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Machine Learning
2009-03-31Paper
Tuning Bandit Algorithms in Stochastic Environments
Lecture Notes in Computer Science
2008-08-19Paper
Performance Bounds in $L_p$‐norm for Approximate Value Iteration
SIAM Journal on Control and Optimization
2008-04-03Paper
Pure Exploration for Multi-Armed Bandit Problems
(available as arXiv preprint)
2008-02-19Paper
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
Learning Theory
2007-09-14Paper
Numerical methods for the pricing of swing options: a stochastic control approach
Methodology and Computing in Applied Probability
2007-01-29Paper
An anti-diffusive scheme for viability problems
Applied Numerical Mathematics
2006-08-04Paper
Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control
SIAM Journal on Control and Optimization
2005-09-15Paper
Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations
Comptes Rendus. Mathématique. Académie des Sciences, Paris
2005-04-28Paper
A study of reinfrocement learning in the continuous case by the means of viscosity solutions
Machine Learning
2000-11-05Paper


Research outcomes over time


This page was built for person: Rémi Munos