On-policy concurrent reinforcement learning
From MaRDI portal
Publication:4670596
DOI10.1080/09528130412331297956zbMATH Open1066.68106OpenAlexW1972847450WikidataQ113437934 ScholiaQ113437934MaRDI QIDQ4670596FDOQ4670596
Authors: Bikramjit Banerjee, Sandip Sen, Jing Peng
Publication date: 22 April 2005
Published in: Journal of Experimental & Theoretical Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1080/09528130412331297956
Recommendations
- Multiagent learning using a variable learning rate
- A new \(Q\) learning algorithm for multi-agent systems
- AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
- scientific article; zbMATH DE number 5957383
- Individual Q-Learning in Normal Form Games
Cites Work
- Non-cooperative games
- \({\mathcal Q}\)-learning
- Multiagent learning using a variable learning rate
- An analysis of temporal-difference learning with function approximation
- Two-person nonzero-sum games and quadratic programming
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Fast online \(Q(\lambda)\)
Cited In (1)
This page was built for publication: On-policy concurrent reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4670596)