Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations
From MaRDI portal
Publication:6414751
arXiv2210.12320MaRDI QIDQ6414751FDOQ6414751
Emile Anand, Yisong Yue, Yingying Li, Author name not available (Why is that?), Adam Wierman, James A. Preiss
Publication date: 21 October 2022
Abstract: We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks.
Has companion code repository: https://github.com/jpreiss/adaptive_policy_selection
This page was built for publication: Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6414751)