| Publication | Date of Publication | Type |
|---|
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling Machine Learning | 2021-11-24 | Paper |
| scientific article; zbMATH DE number 7306905 (Why is no real title available?) | 2021-02-05 | Paper |
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values Automatica | 2018-06-20 | Paper |
\(\text{Q}(\lambda)\) with off-policy corrections Lecture Notes in Computer Science | 2016-11-09 | Paper |
Analysis of classification-based policy iteration algorithms Journal of Machine Learning Research (JMLR) | 2016-06-06 | Paper |
| Adaptive strategy for stratified Monte Carlo sampling | 2016-02-19 | Paper |
Regret bounds for restless Markov bandits Theoretical Computer Science | 2014-10-31 | Paper |
Minimax number of strata for online stratified sampling: the case of noisy samples Theoretical Computer Science | 2014-10-31 | Paper |
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model Machine Learning | 2014-08-20 | Paper |
From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning Foundations and Trends® in Machine Learning | 2014-07-04 | Paper |
| scientific article; zbMATH DE number 6276198 (Why is no real title available?) | 2014-04-01 | Paper |
| Finite-sample analysis of least-squares policy iteration | 2014-04-01 | Paper |
| \(X\)-armed bandits | 2014-02-03 | Paper |
Editors' introduction Lecture Notes in Computer Science | 2013-11-06 | Paper |
Kullback-Leibler upper confidence bounds for optimal sequential allocation The Annals of Statistics | 2013-09-25 | Paper |
Kullback-Leibler upper confidence bounds for optimal sequential allocation The Annals of Statistics | 2013-09-25 | Paper |
Thompson sampling: an asymptotically optimal finite-time analysis Lecture Notes in Computer Science | 2012-10-16 | Paper |
Minimax number of strata for online stratified sampling given noisy samples Lecture Notes in Computer Science | 2012-10-16 | Paper |
Regret Bounds for Restless Markov Bandits Lecture Notes in Computer Science | 2012-10-16 | Paper |
Learning with stochastic inputs and adversarial outputs Journal of Computer and System Sciences | 2012-08-17 | Paper |
| Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit | 2012-05-18 | Paper |
| Finite-time bounds for fitted value iteration | 2011-11-08 | Paper |
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits Lecture Notes in Computer Science | 2011-10-19 | Paper |
| Policy gradient in continuous time | 2011-10-12 | Paper |
| Geometric variance reduction in Markov chains: application to value function and gradient estimation | 2011-10-12 | Paper |
| A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences | 2011-05-29 | Paper |
Pure exploration in finitely-armed and continuous-armed bandits Theoretical Computer Science | 2011-04-14 | Paper |
Pure exploration in multi-armed bandits problems Lecture Notes in Computer Science | 2009-12-01 | Paper |
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits Theoretical Computer Science | 2009-05-12 | Paper |
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning | 2009-03-31 | Paper |
Tuning Bandit Algorithms in Stochastic Environments Lecture Notes in Computer Science | 2008-08-19 | Paper |
Performance Bounds in $L_p$‐norm for Approximate Value Iteration SIAM Journal on Control and Optimization | 2008-04-03 | Paper |
Pure Exploration for Multi-Armed Bandit Problems (available as arXiv preprint) | 2008-02-19 | Paper |
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path Learning Theory | 2007-09-14 | Paper |
Numerical methods for the pricing of swing options: a stochastic control approach Methodology and Computing in Applied Probability | 2007-01-29 | Paper |
An anti-diffusive scheme for viability problems Applied Numerical Mathematics | 2006-08-04 | Paper |
Sensitivity Analysis Using Itô--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control SIAM Journal on Control and Optimization | 2005-09-15 | Paper |
Consistency of a simple multidimensional scheme for Hamilton-Jacobi-Bellman equations Comptes Rendus. Mathématique. Académie des Sciences, Paris | 2005-04-28 | Paper |
A study of reinfrocement learning in the continuous case by the means of viscosity solutions Machine Learning | 2000-11-05 | Paper |