A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
RedirectionBot (talk | contribs)
Removed claim: author (P16): Item:Q246838
Property / author
 
Property / author: Tao Pham Dinh / rank
Normal rank
 

Revision as of 00:48, 12 February 2024

scientific article
Language Label Description Also known as
English
A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
scientific article

    Statements

    A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (English)
    0 references
    0 references
    0 references
    9 May 2019
    0 references
    This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.
    0 references
    batch reinforcement learning
    0 references
    Markov decision process
    0 references
    DC programming
    0 references
    dca
    0 references
    optimal Bellman residual
    0 references

    Identifiers