A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (Q2031059)

From MaRDI portal
scientific article
Language Label Description Also known as
English
A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains
scientific article

    Statements

    A neural network-based policy iteration algorithm with global \(H^2\)-superlinear convergence for stochastic games on domains (English)
    0 references
    8 June 2021
    0 references
    The following Hamilton-Jacobi-Bellman-Isaacs (HJBI) nonhomogeneous Dirichlet boundary value problem is considered: $F(u): =-a^{ij}(x)\partial_{ij}u+ G(x,u,\nabla u)=0$, for a.e. $x\in \Omega$, $\tau u=g$, on $\partial\Omega$, with a nonlinear Hamiltonian, $G(x,u,\nabla u)=\max_{\alpha \in A}\min_{\beta \in B}(b^i(x,\alpha,\beta)$ $\partial_iu(x)+c(x,\alpha,\beta)u(x) -f(x,\alpha,\beta))$. The aim here is to investigate some numerical algorithms for solving this kind of problems. The second section is devoted to basics. Under some assumptions on the coefficients, the uniqueness of the strong solution in $H^2(\Omega)$ is proved. In the third section one presents the policy iteration algorithm -- Algorithm 1 -- for the Dirichlet problem, followed by the convergence analysis. Results on semi smoothness of the HJBI operator, q-superlinear convergence of Algorithm 1 and global convergence of Algorithm 1 are proved. In the fourth section the authors develop an inexact policy algorithm for the stated Dirichlet problem. The idea is to compute an approximate solution for the linear Dirichlet problem for the iteration $u^{k+1}\in H^2(\Omega)$ in Algorithm 1, by solving an optimization problem over a set of trial functions, within a given accuracy. The new inexact policy iteration algorithm for the Dirichlet problem -- Algorithm 2 -- is presented and under some special assumptions a result on global superlinear convergence is proved. In the fifth section we find an extension of the developed iteration scheme to other boundary value problems and a connection to the artificial neural network technology. One considers a HJBI oblique derivative problem \[ F(u): =-a^{ij}(x)\partial _{ij}u+G(x,u,\nabla u)=0,\text{ for a.e. }x\in\Omega, \] $Bu:=\gamma^i\tau(\partial_iu)+\gamma^0$ $\tau u-g$, on $\partial\Gamma$. Under some assumptions on the coefficients, one proves that the oblique derivative problem admits a unique strong solution in $H^2(\Omega)$. For solving the oblique derivative problem one develops a neural network-based policy iteration algorithm, Algorithm 3. The global superlinear convergence of Algorithm 3 is proved. In the sixth section, there is a large discussion on applications of the developed algorithms to the stochastic Zermelo navigation problem. Some fundamental results used in the article are resumed at the end of the paper.
    0 references
    Hamilton-Jacobi-Bellman-Isaacs equations
    0 references
    neural networks
    0 references
    policy iteration
    0 references
    inexact semismooth Newton method
    0 references
    global conergence
    0 references
    \(q\)-superlinear convergence
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references