Deep Q‐learning: A robust control approach

From MaRDI portal
Publication:6136628

DOI10.1002/RNC.6457zbMATH Open1530.93087arXiv2201.08610OpenAlexW4307820093MaRDI QIDQ6136628FDOQ6136628


Authors: Balázs Varga, B. Kulcsár, Morteza Haghir Chehreghani Edit this on Wikidata


Publication date: 17 January 2024

Published in: International Journal of Robust and Nonlinear Control (Search for Journal in Brave)

Abstract: In this paper, we place deep Q-learning into a control-oriented perspective and study its learning dynamics with well-established techniques from robust control. We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Then, we ensure convergence via robust controllers acting as dynamical rewards in the loss function. We synthesize three controllers: state-feedback gain scheduling H2, dynamic Hinf, and constant gain Hinf controllers. Setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature compared to the heuristics in reinforcement learning. In addition, our approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Hinf controlled learning performs slightly better than Double deep Q-learning.


Full work available at URL: https://arxiv.org/abs/2201.08610







Cites Work






This page was built for publication: Deep Q‐learning: A robust control approach

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6136628)