On stochastic games with lack of information on one side (Q1119190)

A stochastic game with lack of information on one side (SGLIOS) is a two- person 0-sum game played in stages. Let us assume that at stage N the game is in state \(s_ N\), \(s_ N\in \{1,...,k\}\) and that \(G^{s_ N}\) is an \(m\times n\) matrix. Then, each player chooses an action (a row \(i_ N\) and column \(j_ N\) of the state matrix), the immediate payoff \(G^{s_ N}_{i_ Nj_ N}\) enters into a list kept by nature, the game moves to a new state \(s_{N+1}\) according to a law \(q(s_{N+1}| s_ N,i_ N,j_ N)\) and everything starts all over again. Both players learn their moves \((i_ N,j_ N)\) but only player I knows the state the game is in at each stage. If \(q(s_{N+1}| s_ N,i_ N,j_ N)=q(s_{N+1}| s_ N)\), we name the SGLIOS Markov, and hence we talk about MGLIOS. In this paper SGLIOS are examined under the limiting expected average payoff criterion (discounted SGLIOS are examined in another paper). In part 2 of the paper, it is shown that player I doesn't have to remember the past string of states and an alternative way of playing \(\Gamma\), the SGLIOS, is introduced through updating. In part 3, it is shown that if the resulting Markov chain has one ergodic aperiodic class, then the value of the MGLIOS exists. In part 4, a sufficient condition for the optimality of the ``one stage look ahead'' (1-SLA) strategy of I, which is a function of \(p_ N\)- the posterior distribution on the state space given the history up to stage N -, is given and applied to the case of MGLIOS with \(K=2\) and sequential mode of moves. It is shown then for a ``wedge shaped'' neighborhood of 0 the value of the game is \(v_ 1(\pi_ 0)\), where \(v_ 1\) is the value function of the one stage game and \(\pi_ 0\) is the steady state of the transition matrix. In this neighborhood, the information that II obtains along the course of play is of no use to him. Finally, part 5 deals with a particular example similar to one very well known from the theory of repeated games with lack of information on one side. The value, which is different from \(v_ 1(\pi_ 0)\), and the optimal strategy of I, which is a 1-SLA different to the usual one, are obtained through dynamic programming methods.

0 references

zbMATH Keywords

stochastic game with lack of information on

0 references

limiting expected average payoff criterion

0 references

Markov chain

0 references