A near-optimal polynomial time algorithm for learning in certain classes of stochastic games (Q1583226)

scientific article

Language	Label	Description	Also known as
English	A near-optimal polynomial time algorithm for learning in certain classes of stochastic games	scientific article

Statements

instance of

scholarly article

0 references

title

A near-optimal polynomial time algorithm for learning in certain classes of stochastic games (English)

0 references

published in

Artificial Intelligence

0 references

publication date

26 October 2000

0 references

review text

We present a new algorithm for polynomial time learning of optimal behavior in single-controller stochastic games. This algorithm incorporates and integrates important recent results of \textit{Kearns} and \textit{Singh} [Proc. ICML-98 (1998)] in reinforcement learning and of \textit{Monderer} and \textit{M. Tennenholtz} [J. Artif. Intell. Res. 7, 231 (1997)] in repeated games. In stochastic games, the agent must cope with the existence of an adversary whose actions can be arbitrary. In particular, this adversary can withhold information about the game matrix by refraining from (or rarely) performing certain actions. This forces upon us an exploration versus exploitation dilemma more complex than in Markov decision processes in which, given information about particular parts of a game matrix, the agent must decide how much effort to invest in learning the unknown parts of the matrix. We present a polynomial time algorithm that addresses these issues in the context of the class of single controller stochastic games, providing the agent with near-optimal return.

0 references

zbMATH Keywords

stochastic games

0 references

polynomial time learning in hostile environments

0 references

exploration versus exploitation in multi-agent systems

0 references

author

Ronen I. Brafman

0 references

Moshe Tennenholtz