Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (Q5939326)

From MaRDI portal
scientific article; zbMATH DE number 1625695
Language Label Description Also known as
English
Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs
scientific article; zbMATH DE number 1625695

    Statements

    Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (English)
    0 references
    0 references
    5 August 2002
    0 references
    Finite stochastic games are actually competitive Markovian decision processes that involve two or more players (controllers). This paper presents a novel algorithm for two finite irreducible Markov chains zero-sum stochastic games with unknown transition matrices and average payoffs. This algorithm is based on Lagrange multipliers. A regularized Lagrangian function is introduced that ensures the uniqueness of the corresponding saddle-point (equilibrium point) as well as a new normalization procedure participating in the adaptive strategy that asymptotically realizes this equilibrium. The algorithm proposed is adaptive in the sense that it provides learning control policies for both players. The saddle-point is shown to be unique. The convergence properties are established, and it is shown that this adaptive control scheme has a convergence order of magnitude \(n^{-1/3}\).
    0 references
    adaptive policy
    0 references
    finite stochastic games
    0 references
    zero-sum game
    0 references
    competitive Markovian decision processes
    0 references
    irreducible Markov chains
    0 references
    average payoffs
    0 references
    regularized Lagrangian function
    0 references
    saddle-point
    0 references
    normalization procedure
    0 references
    learning control policies
    0 references

    Identifiers