Kalman temporal differences

DOI10.1613/JAIR.3077MaRDI QIDQ3055813zbMATH OpenOpenAlexDBLPWikidataFDO

Authors Matthieu Geist, Olivier Pietquin

Publication date 10 November 2010

Published in Journal of Artificial Intelligence Research (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1406.3270

Learning and adaptive systems in artificial intelligence (68T05) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20) Markov processes (60J99) Probability in computer science (algorithm analysis, random structures, phase transitions, etc.) (68Q87)

Abstract: Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Recommendations

Cited in

(6)

This page was built for publication: Kalman temporal differences

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3055813)