A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537): Difference between revisions

This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.

0 references

reviewed by

Anna Jaśkiewicz

0 references

zbMATH Keywords

batch reinforcement learning

0 references

Markov decision process

0 references

DC programming

0 references

dca

0 references

optimal Bellman residual

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

1434.90159

0 references

DOI

10.1007/s10898-018-0698-y

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2633537

Revision as of 00:48, 12 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item ← Older edit	Revision as of 07:55, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property. Newer edit →
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank