A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537)

From MaRDI portal
scientific article
Language Label Description Also known as
English
A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
scientific article

    Statements

    A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (English)
    0 references
    0 references
    0 references
    0 references
    9 May 2019
    0 references
    This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.
    0 references
    0 references
    batch reinforcement learning
    0 references
    Markov decision process
    0 references
    DC programming
    0 references
    dca
    0 references
    optimal Bellman residual
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references