Publication:5477859
From MaRDI portal
DOI10.1023/A:1018056104778zbMath1099.93534MaRDI QIDQ5477859
Andrew G. Barto, Steven J. Bradtke
Publication date: 29 June 2006
Published in: Machine Learning (Search for Journal in Brave)
93E24: Least squares and related methods for stochastic control systems
93E35: Stochastic learning and adaptive control
90C40: Markov and semi-Markov decision processes
Related Items
The optimal unbiased value estimator and its relation to LSTD, TD and MC, Solving factored MDPs using non-homogeneous partitions, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Projected equation methods for approximate solution of large linear systems, Natural actor-critic algorithms, Hybrid least-squares algorithms for approximate policy evaluation, Reinforcement learning for a biped robot based on a CPG-actor-critic method, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Basis function adaptation in temporal difference reinforcement learning, Unnamed Item