Value iteration for simple stochastic games: stopping criterion and learning algorithm

From MaRDI portal
Publication:6041135

DOI10.1007/978-3-319-96145-3_36zbMATH Open1511.91010arXiv1804.04901OpenAlexW2963112285MaRDI QIDQ6041135FDOQ6041135


Authors: Edon Kelmendi, Julia Krämer, Jan Křetínský, Maximilian Weininger Edit this on Wikidata


Publication date: 26 May 2023

Published in: Computer Aided Verification (Search for Journal in Brave)

Abstract: Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of over-approximations of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulation-based asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.


Full work available at URL: https://arxiv.org/abs/1804.04901




Recommendations




Cited In (13)





This page was built for publication: Value iteration for simple stochastic games: stopping criterion and learning algorithm

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6041135)