Regularized policy iteration with nonparametric function spaces
zbMATH Open1392.68345MaRDI QIDQ2834459FDOQ2834459
Authors: Amir-Massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor
Publication date: 22 November 2016
Published in: Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)
Full work available at URL: http://jmlr.csail.mit.edu/papers/v17/13-016.html
Recommendations
- Nonconvex policy search using variational inequalities
- Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
- Non-parametric policy search with limited information loss
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Reinforcement learning with approximation spaces
- Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint
- Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces
- Approximate policy optimization and adaptive control in regression models
- Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
- On the theory of policy gradient methods: optimality, approximation, and distribution shift
regularizationreinforcement learningapproximate policy iterationfinite-sample analysisnon-parametric method
Learning and adaptive systems in artificial intelligence (68T05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Markov and semi-Markov decision processes (90C40)
Cited In (20)
- A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets
- Off-policy estimation of long-term average outcomes with applications to mobile health
- Batch policy learning in average reward Markov decision processes
- Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
- Model selection in reinforcement learning
- Performance guarantees for policy learning
- Low-Rank Representation of Reinforcement Learning Policies
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- On high-order differentiability of the policy function
- Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering
- Projected state-action balancing weights for offline reinforcement learning
- Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
- Optimal policy evaluation using kernel-based temporal difference methods
- Analysis of classification-based policy iteration algorithms
- Learning when-to-treat policies
- Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
- A mathematical perspective of machine learning
This page was built for publication: Regularized policy iteration with nonparametric function spaces
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2834459)