10.1162/153244303768966102
From MaRDI portal
Publication:3044133
DOI10.1162/153244303768966102zbMath1088.68752OpenAlexW2131420237MaRDI QIDQ3044133
Publication date: 10 August 2004
Published in: CrossRef Listing of Deleted DOIs (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/153244303768966102
Related Items
A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications ⋮ Human motor learning is robust to control-dependent noise ⋮ On the convergence of reinforcement learning with Monte Carlo exploring starts ⋮ A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system ⋮ Temporal-difference search in Computer Go ⋮ Sampled fictitious play for approximate dynamic programming ⋮ Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control ⋮ Empirical Dynamic Programming ⋮ A simulation-based approach to stochastic dynamic programming