Multiagent reinforcement learning with regret matching for robot soccer (Q474272): Difference between revisions

Summary: This paper proposes a novel multiagent reinforcement learning (MARL) algorithm Nash-\(Q\) learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash-\(Q\) learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash-\(Q\) learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash-\(Q\) learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash-\(Q\) learning algorithm, the use of regret matching during the learning phase of Nash-\(Q\) learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

1299.68066

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:474272

Revision as of 02:39, 13 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Removed claim: author (P16): Item:Q321709 ← Older edit	Latest revision as of 00:20, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property.
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank