Regularized policy iteration with nonparametric function spaces

Authors Amir-Massoud Farahmand, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor

Publication date 22 November 2016

Published in Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)

Full work available at URL http://jmlr.csail.mit.edu/papers/v17/13-016.html

regularization reinforcement learning approximate policy iteration finite-sample analysis non-parametric method

Learning and adaptive systems in artificial intelligence (68T05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Markov and semi-Markov decision processes (90C40)

Recommendations

Nonconvex policy search using variational inequalities
Policy iterations for reinforcement learning problems in continuous time and space -- fundamental theory and methods
Non-parametric policy search with limited information loss
Approximate gradient methods in policy-space optimization of Markov reward processes
Reinforcement learning with approximation spaces
Smoothed functional-based gradient algorithms for off-policy reinforcement learning: a non-asymptotic viewpoint
Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces
Approximate policy optimization and adaptive control in regression models
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
On the theory of policy gradient methods: optimality, approximation, and distribution shift

Cited in

(20)

A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets
Off-policy estimation of long-term average outcomes with applications to mobile health
Batch policy learning in average reward Markov decision processes
Model selection in reinforcement learning
Performance guarantees for policy learning
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
On high-order differentiability of the policy function
Low-Rank Representation of Reinforcement Learning Policies
Nonparametric approximation generalized policy iteration reinforcement learning algorithm based on states clustering
Projected state-action balancing weights for offline reinforcement learning
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization
Optimal policy evaluation using kernel-based temporal difference methods
Analysis of classification-based policy iteration algorithms
Learning when-to-treat policies
Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
Multi-agent reinforcement learning: a selective overview of theories and algorithms
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications
A mathematical perspective of machine learning

This page was built for publication: Regularized policy iteration with nonparametric function spaces

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2834459)