Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (Q5939326)
From MaRDI portal
scientific article; zbMATH DE number 1625695
Language | Label | Description | Also known as |
---|---|---|---|
English | Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs |
scientific article; zbMATH DE number 1625695 |
Statements
Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs (English)
0 references
5 August 2002
0 references
Finite stochastic games are actually competitive Markovian decision processes that involve two or more players (controllers). This paper presents a novel algorithm for two finite irreducible Markov chains zero-sum stochastic games with unknown transition matrices and average payoffs. This algorithm is based on Lagrange multipliers. A regularized Lagrangian function is introduced that ensures the uniqueness of the corresponding saddle-point (equilibrium point) as well as a new normalization procedure participating in the adaptive strategy that asymptotically realizes this equilibrium. The algorithm proposed is adaptive in the sense that it provides learning control policies for both players. The saddle-point is shown to be unique. The convergence properties are established, and it is shown that this adaptive control scheme has a convergence order of magnitude \(n^{-1/3}\).
0 references
adaptive policy
0 references
finite stochastic games
0 references
zero-sum game
0 references
competitive Markovian decision processes
0 references
irreducible Markov chains
0 references
average payoffs
0 references
regularized Lagrangian function
0 references
saddle-point
0 references
normalization procedure
0 references
learning control policies
0 references