A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537): Difference between revisions
From MaRDI portal
Revision as of 04:44, 19 July 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning |
scientific article |
Statements
A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (English)
0 references
9 May 2019
0 references
This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.
0 references
batch reinforcement learning
0 references
Markov decision process
0 references
DC programming
0 references
dca
0 references
optimal Bellman residual
0 references
0 references
0 references
0 references
0 references
0 references
0 references