A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning (Q2633537): Difference between revisions

This paper studies Difference of Convex functions (DC) programming and applies DC Algorithm (DCA) for reinforcement learning.The objective is to estimate an optimal learning policy in the MDP model. The authors solve the problem by finding the zero of the empirical optimal Bellman residual (OBR) via linear approximation. This is done by a unified approach based on DC programming and algorithms. The main contributions are as follows: 1) to develop attractive and efficient DC algorithms based on minimisation of the $l_p$-norm of the empirical OBR; 2) to propose DCA with successive DC decomposition for the squared $l_2$-norm of the empirical OBR; 3) to propose a new formulation of the OBR without using the $l_p$-norm. The results are illustrated by numerical examples.

0 references

reviewed by

Anna Jaśkiewicz

0 references

zbMATH Keywords

batch reinforcement learning

0 references

Markov decision process

0 references

DC programming

0 references

dca

0 references

optimal Bellman residual

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/s10898-018-0698-y

0 references

cites work

Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path

0 references

Q3245701

0 references

Q3795523

0 references

Q4257216

0 references

Natural actor-critic algorithms

0 references

Optimization of the norm of a vector-valued DC function and applications

0 references

On the norm of a dc function

0 references

Approximate dynamic programming with a fuzzy parameterization

0 references

Q4420767

0 references

An interior proximal linearized method for DC programming based on Bregman distance or second-order homogeneous kernels

0 references

Q3093261

0 references

A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications

0 references

Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations

0 references

A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning

0 references

Reinforcement Learning: A Tutorial Survey and Recent Advances

0 references

Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA

0 references

Double Bundle Method for finding Clarke Stationary Points in Nonsmooth DC Programming

0 references

A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes

0 references

Convergence of convex functions and duality

0 references

10.1162/1532443041827907

0 references

The DC (Difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems

0 references

Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm

0 references

Self-organizing maps by difference of convex functions optimization

0 references

A DC Programming Approach for Finding Communities in Networks

0 references

Solving a class of linearly constrained indefinite quadratic problems by DC algorithms

0 references

DC programming and DCA: thirty years of developments

0 references

DC approximation approaches for sparse optimization

0 references

Feature selection for linear SVMs under uncertain data: robust optimization based on difference of convex functions algorithms

0 references

Performance Bounds in $L_p$‐norm for Approximate Value Iteration

0 references

Proximal bundle methods for nonsmooth DC programming

0 references

An inertial algorithm for DC programming

0 references

Q3780016

0 references

Convex analysis approach to d. c. programming: Theory, algorithms and applications

0 references

A D.C. Optimization Algorithm for Solving the Trust-Region Subproblem

0 references

Q4315289

0 references

Convex Analysis

0 references

On the relations between two types of convergence for convex functions

0 references

Discrete tomography by convex--concave regularization and D.C. programming

0 references

Generalized polynomial approximations in Markovian decision processes

0 references

Q5850827

0 references

Convergence results for single-step on-policy reinforcement-learning algorithms

0 references

Global convergence of a proximal linearized algorithm for difference of convex functions

0 references

Algorithms for Reinforcement Learning

0 references

Aggregate codifferential method for nonsmooth DC optimization

0 references

Q4261789

0 references

Reinforcement learning algorithms with function approximation: recent advances and applications

0 references

Identifiers

zbMATH Open document ID

1434.90159

0 references

DOI

10.1007/s10898-018-0698-y

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2633537

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1007/s10898-018-0698-y
+Normal rank
@@ Property / OpenAlex ID @@
+W2888238026
@@ Property / OpenAlex ID: W2888238026 / rank @@
+Normal rank
@@ Property / cites work @@
+Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
+Normal rank
@@ Property / cites work @@
+Q3245701
@@ Property / cites work: Q3245701 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3795523
@@ Property / cites work: Q3795523 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4257216
@@ Property / cites work: Q4257216 / rank @@
+Normal rank
@@ Property / cites work @@
+Natural actor-critic algorithms
@@ Property / cites work: Natural actor-critic algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Optimization of the norm of a vector-valued DC function and applications
+Normal rank
@@ Property / cites work @@
+On the norm of a dc function
@@ Property / cites work: On the norm of a dc function / rank @@
+Normal rank
@@ Property / cites work @@
+Approximate dynamic programming with a fuzzy parameterization
+Normal rank
@@ Property / cites work @@
+Q4420767
@@ Property / cites work: Q4420767 / rank @@
+Normal rank
@@ Property / cites work @@
+An interior proximal linearized method for DC programming based on Bregman distance or second-order homogeneous kernels
+Normal rank
@@ Property / cites work @@
+Q3093261
@@ Property / cites work: Q3093261 / rank @@
+Normal rank
@@ Property / cites work @@
+A Method for Finding Structured Sparse Solutions to Nonnegative Least Squares Problems with Applications
+Normal rank
@@ Property / cites work @@
+Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations
+Normal rank
@@ Property / cites work @@
+A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+Reinforcement Learning: A Tutorial Survey and Recent Advances
+Normal rank
@@ Property / cites work @@
+Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA
+Normal rank
@@ Property / cites work @@
+Double Bundle Method for finding Clarke Stationary Points in Nonsmooth DC Programming
+Normal rank
@@ Property / cites work @@
+A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes
+Normal rank
@@ Property / cites work @@
+Convergence of convex functions and duality
@@ Property / cites work: Convergence of convex functions and duality / rank @@
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+The DC (Difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems
+Normal rank
@@ Property / cites work @@
+Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm
+Normal rank
@@ Property / cites work @@
+Self-organizing maps by difference of convex functions optimization
+Normal rank
@@ Property / cites work @@
+A DC Programming Approach for Finding Communities in Networks
+Normal rank
@@ Property / cites work @@
+Solving a class of linearly constrained indefinite quadratic problems by DC algorithms
+Normal rank
@@ Property / cites work @@
+DC programming and DCA: thirty years of developments
+Normal rank
@@ Property / cites work @@
+DC approximation approaches for sparse optimization
+Normal rank
@@ Property / cites work @@
+Feature selection for linear SVMs under uncertain data: robust optimization based on difference of convex functions algorithms
+Normal rank
@@ Property / cites work @@
+Performance Bounds in $L_p$‐norm for Approximate Value Iteration
+Normal rank
@@ Property / cites work @@
+Proximal bundle methods for nonsmooth DC programming
+Normal rank
@@ Property / cites work @@
+An inertial algorithm for DC programming
@@ Property / cites work: An inertial algorithm for DC programming / rank @@
+Normal rank
@@ Property / cites work @@
+Q3780016
@@ Property / cites work: Q3780016 / rank @@
+Normal rank
@@ Property / cites work @@
+Convex analysis approach to d. c. programming: Theory, algorithms and applications
+Normal rank
@@ Property / cites work @@
+A D.C. Optimization Algorithm for Solving the Trust-Region Subproblem
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+Convex Analysis
@@ Property / cites work: Convex Analysis / rank @@
+Normal rank
@@ Property / cites work @@
+On the relations between two types of convergence for convex functions
+Normal rank
@@ Property / cites work @@
+Discrete tomography by convex--concave regularization and D.C. programming
+Normal rank
@@ Property / cites work @@
+Generalized polynomial approximations in Markovian decision processes
+Normal rank
@@ Property / cites work @@
+Q5850827
@@ Property / cites work: Q5850827 / rank @@
+Normal rank
@@ Property / cites work @@
+Convergence results for single-step on-policy reinforcement-learning algorithms
+Normal rank
@@ Property / cites work @@
+Global convergence of a proximal linearized algorithm for difference of convex functions
+Normal rank
@@ Property / cites work @@
+Algorithms for Reinforcement Learning
@@ Property / cites work: Algorithms for Reinforcement Learning / rank @@
+Normal rank
@@ Property / cites work @@
+Aggregate codifferential method for nonsmooth DC optimization
+Normal rank
@@ Property / cites work @@
+Q4261789
@@ Property / cites work: Q4261789 / rank @@
+Normal rank
@@ Property / cites work @@
+Reinforcement learning algorithms with function approximation: recent advances and applications
+Normal rank
@@ Property / Wikidata QID @@
+Q129400449
@@ Property / Wikidata QID: Q129400449 / rank @@
+Normal rank