Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Authors Pan Xu, Felicia Gao, Quanquan Gu

Publication date 18 September 2019

Has companion code repository https://github.com/xgfelicia/SRVRPG

Abstract: Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires

O (1 / e p s i l o n^{3 / 2})

episodes to find an

e p s i l o n

-approximate stationary point of the nonconcave performance function

(i.e.,

such that

). This sample complexity improves the existing result

O (1 / e p s i l o n^{5 / 3})

for stochastic variance reduced policy gradient algorithms by a factor of

O (1 / e p s i l o n^{1 / 6})

. In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

This page was built for publication: Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6325577)