Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
DOI10.1007/s10994-007-5038-2zbMath1470.68072OpenAlexW2104753538MaRDI QIDQ1009248
Csaba Szepesvári, András Antos, Rémi Munos
Publication date: 31 March 2009
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10994-007-5038-2
reinforcement learningnonparametric regressionpolicy iterationfinite-sample boundsoff-policy learningleast-squares regressionBellman-residual minimizationleast-squares temporal difference learning
Nonparametric regression and quantile regression (62G08) Learning and adaptive systems in artificial intelligence (68T05)
Related Items (19)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Markov chains and stochastic stability
- Generalized polynomial approximations in Markovian decision processes
- Stochastic optimal control. The discrete time case
- Mixing: Properties and examples
- Rates of convergence for empirical processes of stationary mixing sequences
- Sphere packing numbers for subsets of the Boolean \(n\)-cube with bounded Vapnik-Chervonenkis dimension
- Nonparametric time series prediction through adaptive model selection
- Kernel-based reinforcement learning
- Histogram regression estimation using data-dependent partitions
- A distribution-free theory of nonparametric regression
- Adaptive estimation in autoregression or \(\beta\)-mixing regression via model selection
- Functional Approximations and Dynamic Programming
- Mixing Conditions for Markov Chains
- MIXING AND MOMENT PROPERTIES OF VARIOUS GARCH AND STOCHASTIC VOLATILITY MODELS
- 10.1162/1532443041827907
- Neural Network Learning
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- Convergence of stochastic processes
This page was built for publication: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path