A Bayesian reinforcement learning approach in Markov games for computing near-optimal policies
DOI10.1007/s10472-023-09860-3zbMath1527.91013OpenAlexW4380153442MaRDI QIDQ6059222
Publication date: 2 November 2023
Published in: Annals of Mathematics and Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s10472-023-09860-3
Learning and adaptive systems in artificial intelligence (68T05) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Stochastic games, stochastic differential games (91A15) Games with incomplete information, Bayesian games (91A27)
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Computing the Stackelberg/Nash equilibria using the extraproximal method: convergence analysis and implementation details for Markov chains games
- Dual-control theory. I
- Hierarchical Bayesian models of reinforcement learning: introduction and comparison to alternative methods
- Preconditioning Markov chain Monte Carlo method for geomechanical subsidence using multiscale method and machine learning technique
- A proximal/gradient approach for computing the Nash equilibrium in controllable Markov games
- Toward optimal probabilistic active learning using a Bayesian approach
- A Markovian Stackelberg game approach for computing an optimal dynamic mechanism
- Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
- A Tikhonov regularized penalty function approach for solving polylinear programming problems
- The price of anarchy as a classifier for mechanism design in a Pareto-Bayesian-Nash context
- Convex Optimization: Algorithms and Complexity
- Dual Control for Approximate Bayesian Reinforcement Learning
- A Tikhonov regularization parameter approach for solving Lagrange constrained optimization problems
This page was built for publication: A Bayesian reinforcement learning approach in Markov games for computing near-optimal policies