A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537): Difference between revisions

This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.

0 references

reviewed by

Anna Jaśkiewicz

0 references

zbMATH Keywords

batch reinforcement learning

0 references

Markov decision process

0 references

DC programming

0 references

dca

0 references

optimal Bellman residual

0 references

Identifiers

zbMATH Open document ID

1434.90159

0 references

DOI

10.1007/s10898-018-0698-y

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2633537

Revision as of 17:07, 6 August 2023 Importer (talk \| contribs) Bots 7,038,745 edits ‎Created a new Item	Revision as of 10:22, 3 February 2024 Import240129110113 (talk \| contribs) Bots 7,163,963 edits Added link to MaRDI item. Newer edit →
links / mardi / name	links / mardi / name
		Publication:2633537