Safe learning for near-optimal scheduling

From MaRDI portal
Publication:832074

DOI10.1007/978-3-030-85172-9_13zbMATH Open1491.68151arXiv2005.09253OpenAlexW3196616774MaRDI QIDQ832074FDOQ832074


Authors: Damien Busatto-Gaston, Debraj Chakraborty, Shibashis Guha, G. A. Pérez, Jean-François Raskin Edit this on Wikidata


Publication date: 24 March 2022

Abstract: In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.


Full work available at URL: https://arxiv.org/abs/2005.09253




Recommendations




Cites Work


Uses Software





This page was built for publication: Safe learning for near-optimal scheduling

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832074)