Safe learning for near-optimal scheduling

DOI10.1007/978-3-030-85172-9_13zbMATH Open1491.68151arXiv2005.09253OpenAlexW3196616774MaRDI QIDQ832074FDOQ832074

Authors: Damien Busatto-Gaston, Debraj Chakraborty, Shibashis Guha, G. A. Pérez, Jean-François Raskin

Publication date: 24 March 2022

Abstract: In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.

Full work available at URL: https://arxiv.org/abs/2005.09253

Recommendations

zbMATH Keywords

Monte-Carlo tree search task scheduling model-based learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Markov and semi-Markov decision processes (90C40)

Cites Work

Uses Software

This page was built for publication: Safe learning for near-optimal scheduling

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q832074)