A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
From MaRDI portal
Publication:1583226
DOI10.1016/S0004-3702(00)00039-4zbMath0951.68119WikidataQ126657516 ScholiaQ126657516MaRDI QIDQ1583226
Ronen I. Brafman, Moshe Tennenholtz
Publication date: 26 October 2000
Published in: Artificial Intelligence (Search for Journal in Brave)
stochastic gamesexploration versus exploitation in multi-agent systemspolynomial time learning in hostile environments
Related Items (8)
Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function ⋮ AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents ⋮ Value iteration for simple stochastic games: stopping criterion and learning algorithm ⋮ Computer science and decision theory ⋮ Perspectives on multiagent learning ⋮ Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Efficient learning equilibrium
Cites Work
- Unnamed Item
- Unnamed Item
- An orderfield property for stochastic games when one player controls transition probabilities
- Linear programming and undiscounted stochastic games in which one player controls transitions
- Near-optimal reinforcement learning in polynomial time
- A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations
- Stochastic Games
This page was built for publication: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games