Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (Q5939326)

scientific article; zbMATH DE number 1625695

Language	Label	Description	Also known as
English	Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs	scientific article; zbMATH DE number 1625695

Statements

instance of

scholarly article

0 references

title

Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (English)

0 references

published in

Automatica

0 references

publication date

5 August 2002

0 references

review text

Finite stochastic games are actually competitive Markovian decision processes that involve two or more players (controllers). This paper presents a novel algorithm for two finite irreducible Markov chains zero-sum stochastic games with unknown transition matrices and average payoffs. This algorithm is based on Lagrange multipliers. A regularized Lagrangian function is introduced that ensures the uniqueness of the corresponding saddle-point (equilibrium point) as well as a new normalization procedure participating in the adaptive strategy that asymptotically realizes this equilibrium. The algorithm proposed is adaptive in the sense that it provides learning control policies for both players. The saddle-point is shown to be unique. The convergence properties are established, and it is shown that this adaptive control scheme has a convergence order of magnitude \(n^{-1/3}\).

0 references

reviewed by

Spyros G. Tzafestas

0 references

zbMATH Keywords

adaptive policy

0 references

finite stochastic games

0 references

zero-sum game

0 references

competitive Markovian decision processes