Q-learning for Markov decision processes with a satisfiability criterion

Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40) Stochastic systems in control theory (general) (93E03)

Recommendations

Exploiting the structural properties of the underlying Markov decision problem in the Q-learning algorithm
Q-learning for distributionally robust Markov decision processes
Learning algorithms for finite horizon constrained Markov decision processes
Learning algorithms for Markov decision processes
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
Q-learning and enhanced policy iteration in discounted dynamic programming
\({\mathcal Q}\)-learning
Reinforcement learning in robust Markov decision processes

Cites work

scientific article; zbMATH DE number 5869530 (Why is no real title available?)
scientific article; zbMATH DE number 3855514 (Why is no real title available?)
scientific article; zbMATH DE number 5348356 (Why is no real title available?)
An analog of the minimax theorem for vector payoffs
Approachability in Stackelberg stochastic games with vector costs
Approachable sets of vector payoffs in stochastic games
Asynchronous Stochastic Approximations
Asynchronous stochastic approximation with differential inclusions
Dynamical systems and variational inequalities
Estimation and control in discounted stochastic dynamic programming
Evolutionary Games and Population Dynamics
Guaranteed performance regions in Markovian systems with competing decision makers
Learning algorithms for Markov decision processes with average cost
Multiplicative updates outperform generic no-regret learning in congestion games (extended abstract)
Stochastic Approximations and Differential Inclusions
Stochastic Approximations and Differential Inclusions, Part II: Applications
Stochastic approximation with two time scales
Structural Properties of Optimal Transmission Policies Over a Randomly Varying Channel
Survey of Measurable Selection Theorems
The Borkar-Meyn theorem for asynchronous stochastic approximations
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
The multiplicative weights update method: a meta-algorithm and applications
The projection dynamic and the geometry of population games

Cited in

(5)

\(L^\ast\)-based learning of Markov decision processes (extended version)
scientific article; zbMATH DE number 5670432 (Why is no real title available?)
Prospect-theoretic Q-learning
Revisiting SIR in the age of COVID-19: explicit solutions and control problems
Counterexample explanation by learning small strategies in Markov decision processes

This page was built for publication: Q-learning for Markov decision processes with a satisfiability criterion

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1749413)