Publication:5477859
From MaRDI portal
DOI10.1023/A:1018056104778zbMath1099.93534MaRDI QIDQ5477859
Andrew G. Barto, Steven J. Bradtke
Publication date: 29 June 2006
Published in: Machine Learning (Search for Journal in Brave)
93E24: Least squares and related methods for stochastic control systems
93E35: Stochastic learning and adaptive control
90C40: Markov and semi-Markov decision processes
Related Items
Batch mode reinforcement learning based on the synthesis of artificial trajectories, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Asymptotic analysis of value prediction by well-specified and misspecified models, Solving factored MDPs using non-homogeneous partitions, A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Projected equation methods for approximate solution of large linear systems, Natural actor-critic algorithms, Hybrid least-squares algorithms for approximate policy evaluation, Reinforcement learning for a biped robot based on a CPG-actor-critic method, Restricted gradient-descent algorithm for value-function approximation in reinforcement learning, Basis function adaptation in temporal difference reinforcement learning, Unnamed Item, Approximate policy iteration: a survey and some new methods, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications