A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (Q2031059)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains	scientific article

Statements

scholarly article

0 references

A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (English)

0 references

Foundations of Computational Mathematics

0 references

publication date

8 June 2021

0 references

full work available at URL

https://arxiv.org/abs/1906.02304

0 references

The following Hamilton-Jacobi-Bellman-Isaacs (HJBI) nonhomogeneous Dirichlet boundary value problem is considered: $F(u): =-a^{ij}(x)\partial_{ij}u+ G(x,u,\nabla u)=0$, for a.e. $x\in \Omega$, $\tau u=g$, on $\partial\Omega$, with a nonlinear Hamiltonian, $G(x,u,\nabla u)=\max_{\alpha \in A}\min_{\beta \in B}(b^i(x,\alpha,\beta)$ $\partial_iu(x)+c(x,\alpha,\beta)u(x) -f(x,\alpha,\beta))$. The aim here is to investigate some numerical algorithms for solving this kind of problems. The second section is devoted to basics. Under some assumptions on the coefficients, the uniqueness of the strong solution in $H^2(\Omega)$ is proved. In the third section one presents the policy iteration algorithm -- Algorithm 1 -- for the Dirichlet problem, followed by the convergence analysis. Results on semi smoothness of the HJBI operator, q-superlinear convergence of Algorithm 1 and global convergence of Algorithm 1 are proved. In the fourth section the authors develop an inexact policy algorithm for the stated Dirichlet problem. The idea is to compute an approximate solution for the linear Dirichlet problem for the iteration $u^{k+1}\in H^2(\Omega)$ in Algorithm 1, by solving an optimization problem over a set of trial functions, within a given accuracy. The new inexact policy iteration algorithm for the Dirichlet problem -- Algorithm 2 -- is presented and under some special assumptions a result on global superlinear convergence is proved. In the fifth section we find an extension of the developed iteration scheme to other boundary value problems and a connection to the artificial neural network technology. One considers a HJBI oblique derivative problem \[ F(u): =-a^{ij}(x)\partial _{ij}u+G(x,u,\nabla u)=0,\text{ for a.e. }x\in\Omega, \] $Bu:=\gamma^i\tau(\partial_iu)+\gamma^0$ $\tau u-g$, on $\partial\Gamma$. Under some assumptions on the coefficients, one proves that the oblique derivative problem admits a unique strong solution in $H^2(\Omega)$. For solving the oblique derivative problem one develops a neural network-based policy iteration algorithm, Algorithm 3. The global superlinear convergence of Algorithm 3 is proved. In the sixth section, there is a large discussion on applications of the developed algorithms to the stochastic Zermelo navigation problem. Some fundamental results used in the article are resumed at the end of the paper.

0 references

Claudia Simionescu-Badea

0 references

zbMATH Keywords

Hamilton-Jacobi-Bellman-Isaacs equations

0 references

neural networks

0 references

policy iteration

0 references

inexact semismooth Newton method

0 references

global conergence

0 references

\(q\)-superlinear convergence

0 references

0 references

Christoph Reisinger

0 references

0 references

describes a project that uses

0 references

0 references

MaRDI profile type

MaRDI publication profile

0 references

0 references

An Efficient Policy Iteration Algorithm for Dynamic Programming Equations

0 references

Set-valued analysis

0 references

Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation

0 references

Successive Galerkin approximation algorithms for nonlinear optimal and robust control

0 references

Least-squares finite element methods

0 references

Some Convergence Results for Howard's Algorithm

0 references

0 references

Generalized Hamilton--Jacobi--Bellman Equations with Dirichlet Boundary Condition and Stochastic Exit Time Optimal Control Problem

0 references

Smoothing Methods and Semismooth Methods for Nondifferentiable Operator Equations

0 references

$H^2$-Convergence of Least-Squares Kernel Collocation Methods

0 references

Generalizations of the Dennis--Moré Theorem

0 references

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

0 references

The Deep Ritz Method: a deep learning-based numerical algorithm for solving variational problems

0 references

Generalized Halton sequences in 2008

0 references

0 references

0 references

0 references

Convergence of the deep BSDE method for coupled FBSDEs

0 references

The Primal-Dual Active Set Strategy as a Semismooth Newton Method

0 references

Deep Neural Networks Algorithms for Stochastic Control Problems on Finite Horizon: Convergence Analysis

0 references

Semi–Smooth Newton Methods for Variational Inequalities of the First Kind

0 references

Polynomial Approximation of High-Dimensional Hamilton--Jacobi--Bellman Equations and Applications to Feedback Control of Semilinear Parabolic PDEs

0 references

Robust Feedback Control of Nonlinear PDEs by Numerical Approximation of High-Dimensional Hamilton--Jacobi--Isaacs Equations

0 references

Exponential Convergence and Stability of Howard's Policy Improvement Algorithm for Controlled Diffusions

0 references

On the dynamic programming principle for uniformly nondegenerate stochastic differential games in domains and the Isaacs equations

0 references

Sobolev and Viscosity Solutions for Fully Nonlinear Elliptic and Parabolic Equations

0 references

0 references

Stochastic differential equations with reflecting boundary conditions

0 references

The stochastic reach-avoid problem and set characterization for diffusions

0 references

On the Convergence of Policy Iteration in Stationary Dynamic Programming

0 references

Error Estimates of Penalty Schemes for Quasi-Variational Inequalities Arising from Impulse Control Problems

0 references

Backward stochastic differential equations with jumps and related nonlinear expectations

0 references

Convergence Properties of Policy Iteration

0 references

DGM: a deep learning algorithm for solving partial differential equations

0 references

Discontinuous Galerkin Finite Element Approximation of Hamilton--Jacobi--Bellman Equations with Cordes Coefficients

0 references

0 references

Penalty Methods for the Solution of Discrete HJB Equations—Continuous Control and Obstacle Problems

0 references

Identifiers

zbMATH Open document ID

0 references

10.1007/s10208-020-09460-1

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2031059

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2031059&oldid=36845635"