Stochastic limit-average games are in EXPTIME (Q933752)

A repeated zero-sum stochastic game \(G=(S,A,\Gamma_1,\Gamma_2, \delta,r)\) played by two players is considered. Here \(S\) and \(A\) are finite space of states and set of actions respectively, \(\Gamma_i(s) \subseteq A\), \(i=1,2\), \(s \in S\), describe possible sets of actions of players at each state. If in the state \(s\) players choose actions \(a\) and \(b\) respectively then \(\delta(s,a,b)(t)\) is a probability to reach the state \(t\) and \(r(s,a,b)\) is a real-valued reward of the first player. Strategies \(\sigma\) and \(\pi\) of players can depend on all known history of the game. The limit-average for the player 1 reward of a pair of strategies \(\sigma\) and \(\pi\) and a starting state \(s\) is defined as \[ v_1(s,\sigma,\pi)=E_s^{\sigma, \pi}\left[ \liminf_{n \to \infty} n^{-1} \sum_{i=1}^n r(X_i,\Theta_{i,1}, \Theta_{i.2}) \right], \] where \(X_i\), \(\Theta_{i,1}\), \(\Theta_{i.2}\) stand for the state and chosen actions at instant of time \(i\). In the article the computational complexity of approximating the value of the game is characterized. It is shown that for any given real \(\varepsilon >0\) the value of a game \(G\) at a state \(s\) can be computed to within \(\varepsilon\)-precision in time bounded by an exponential in a polynomial in the size of the game \(G\) times a polynomial function of \(\log 1/\varepsilon\). The main technique is the characterization of values as semi-algebraic quantities.

0 references

reviewed by

Alex V. Kolnogorov

0 references

zbMATH Keywords

stochastic games

0 references

limit-average payoff

0 references

computational complexity

0 references

MaRDI profile type