A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

From MaRDI portal

Publication:1583226

Jump to:navigation, search

DOI10.1016/S0004-3702(00)00039-4zbMath0951.68119WikidataQ126657516 ScholiaQ126657516MaRDI QIDQ1583226

Ronen I. Brafman, Moshe Tennenholtz

Publication date: 26 October 2000

Published in: Artificial Intelligence (Search for Journal in Brave)

zbMATH Keywords

stochastic games exploration versus exploitation in multi-agent systems polynomial time learning in hostile environments

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05)

Related Items (8)

Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function ⋮ AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents ⋮ Value iteration for simple stochastic games: stopping criterion and learning algorithm ⋮ Computer science and decision theory ⋮ Perspectives on multiagent learning ⋮ Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms ⋮ Efficient learning equilibrium

Cites Work

This page was built for publication: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1583226&oldid=13874863"