A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (Q2031059)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains |
scientific article |
Statements
A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (English)
0 references
8 June 2021
0 references
The following Hamilton-Jacobi-Bellman-Isaacs (HJBI) nonhomogeneous Dirichlet boundary value problem is considered: $F(u): =-a^{ij}(x)\partial_{ij}u+ G(x,u,\nabla u)=0$, for a.e. $x\in \Omega$, $\tau u=g$, on $\partial\Omega$, with a nonlinear Hamiltonian, $G(x,u,\nabla u)=\max_{\alpha \in A}\min_{\beta \in B}(b^i(x,\alpha,\beta)$ $\partial_iu(x)+c(x,\alpha,\beta)u(x) -f(x,\alpha,\beta))$. The aim here is to investigate some numerical algorithms for solving this kind of problems. The second section is devoted to basics. Under some assumptions on the coefficients, the uniqueness of the strong solution in $H^2(\Omega)$ is proved. In the third section one presents the policy iteration algorithm -- Algorithm 1 -- for the Dirichlet problem, followed by the convergence analysis. Results on semi smoothness of the HJBI operator, q-superlinear convergence of Algorithm 1 and global convergence of Algorithm 1 are proved. In the fourth section the authors develop an inexact policy algorithm for the stated Dirichlet problem. The idea is to compute an approximate solution for the linear Dirichlet problem for the iteration $u^{k+1}\in H^2(\Omega)$ in Algorithm 1, by solving an optimization problem over a set of trial functions, within a given accuracy. The new inexact policy iteration algorithm for the Dirichlet problem -- Algorithm 2 -- is presented and under some special assumptions a result on global superlinear convergence is proved. In the fifth section we find an extension of the developed iteration scheme to other boundary value problems and a connection to the artificial neural network technology. One considers a HJBI oblique derivative problem \[ F(u): =-a^{ij}(x)\partial _{ij}u+G(x,u,\nabla u)=0,\text{ for a.e. }x\in\Omega, \] $Bu:=\gamma^i\tau(\partial_iu)+\gamma^0$ $\tau u-g$, on $\partial\Gamma$. Under some assumptions on the coefficients, one proves that the oblique derivative problem admits a unique strong solution in $H^2(\Omega)$. For solving the oblique derivative problem one develops a neural network-based policy iteration algorithm, Algorithm 3. The global superlinear convergence of Algorithm 3 is proved. In the sixth section, there is a large discussion on applications of the developed algorithms to the stochastic Zermelo navigation problem. Some fundamental results used in the article are resumed at the end of the paper.
0 references
Hamilton-Jacobi-Bellman-Isaacs equations
0 references
neural networks
0 references
policy iteration
0 references
inexact semismooth Newton method
0 references
global conergence
0 references
\(q\)-superlinear convergence
0 references
0 references
0 references
0 references
0 references
0 references
0 references